A collection of computer systems and programming tips that you may find useful.
 
Brought to you by Craic Computing LLC, a bioinformatics consulting company.

Monday, December 8, 2008

Apache on Mac OS X - ~/Documents permissions

I keep all my code in subdirectories under ~/Documents. That works fine - I can build Rails applications, run them under Mongrel etc and they work fine.

But I wanted to run code under Apache2 and when I set things up the way I've done many times on Linux, with a symlink from the Apache document root (/Library/Webserver/Documents) to my code, I was not able to view those pages due to permissions errors.

The issue is that your Documents directory has these permissions:
drwx------@ 17 jones jones 578 Dec 1 09:45 Documents
These mean that only the owner can access the files in that directory. Apache2 runs as user 'www' and even though you might have made the subdirectories world-readable, it is not able to follow the path get there.

You probably don't want to make your entire Documents tree world-readable, so you have two choices.

1: Make Documents executable by everyone, but not readable:
$ chmod a+x Documents
to get these permissions
drwx--x--x@ 17 jones jones 578 Dec 1 09:45 Documents

2: Move your code to your Public directory which by default is world readable and executable.

Debugging problems like this is a real pain. I assume I've screwed up the Apache httpd.conf file and so I'm looking through the countless directives in that. The Apache error logs don't tell you where the problem lies either, unfortunately.


Apache Installations on Mac OS X

I've ended up with several Apache installations on my Mac OS X 10.5 system. This results in multiple httpd.conf files and great confusion if you modify the wrong one for the server that is currently active.

The primary installation that comes with Mac OS X is Apache2.

The executable is found in /usr/sbin
$ /usr/sbin/httpd -v
Server version: Apache/2.2.9 (Unix)
Server built: Sep 18 2008 21:54:05


The config file is found in /etc/apache2/httpd.conf
The document root is found at /Library/WebServer/Documents/inde.html.en

This version is started/stopped from the Mac OS X control Panel -> Sharing -> Web Sharing checkbox

Hopefully this helps you dissect out any problems that you might have with multiple installations.

Wednesday, November 26, 2008

Rails - Drag and Drop Sorting in a Table

Through use of the Prototype and Script.aculo.us libraries, Rails makes it easy to set up drag and drop capability in your pages.

But all the examples I've seen use Lists of elements and I typically use Tables to display sets of data. I was having a hard time getting drag and drop sorting to work but I got there.

My problem was that I was putting my 'container id' into the <table> tag. That doesn't work. You have to have a <tbody> tag pair inside there - which I simply have never bothered with. Put the 'container id' in there and put the 'element id' in the <tr> tag - not the <td> tag.

In this dummy application you have carts and carts have multiple items. The order of each item in the cart is given by the item 'position' field.
class Cart < ActiveRecord::Base
has_many :items, :order => :position
end

class Item < ActiveRecord::Base
belongs_to :cart
acts_as_list :column => :position, :scope => :cart
end

This would go into your cart show page:
<table>
<tbody id="cart_div">
<% @cart.items.each do |item| %>
<tr id="item_<%= item.id %>">
<td><%= item.position %></td>
<td><%= item.name %></td>
</tr>
<% end %>
</tbody>
</table>

Then elsewhere in that page you need to add this block that sets up the JavaScript magic. Note that you need to specify the tag that will get dragged and dropped. The default is 'li' but here we want 'tr'.
<%= sortable_element 'cart_div',
:url => { :action => 'sort', :id => @cart },
:complete => visual_effect(:highlight, 'cart_div'),
:tag => 'tr'
%>

You also need to create a 'sort' action in your carts controller that looks something like this:
  def sort
@cart = Cart.find(params[:id])
@cart.samples.each do |cart|
item.position = params['cart_div'].index(item.id.to_s) + 1
item.save
end
render :nothing => true
end

Note that you don't need to do anything with your routes.rb file, even though 'sort' is not a standard action. I assume that is because it is using a GET and that just gets handled?

Finally, you need to include the javascript libraries in your page layout, but you knew that already.
<%= javascript_include_tag :defaults %>



Monday, November 24, 2008

Installing Rails 2.2 on Mac OS X - MySQL problem

Rails 2.2 is out and you want to install it - but you may run into this issue on Mac OS X.

1: Make sure you have upgraded to rubygems 1.3.1
$ sudo gem update --system

If that barfs try this which does the same thing a different way:
$ sudo gem install rubygems-update
$ sudo update_rubygems


2: Install Rails
This should install just fine
$ sudo gem install rails

3: One important change in 2.2 is that the Mysql database driver is no longer bundled and you have to install yourself. But the obvious command may well fail like this:

$ sudo gem install mysql
Password:
Building native extensions. This could take a while...
ERROR: Error installing mysql:
ERROR: Failed to build gem native extension.
[...]


The issue is that the gem needs more information about your MySQL installation, so do this instead:

$ sudo gem install mysql -- --with-mysql-config=/usr/local/mysql/bin/mysql_config
Building native extensions. This could take a while...
Successfully installed mysql-2.7
1 gem installed


4: That looks good but you may not be out of the woods yet... I got this when I tried a rake db:migrate

$ rake db:migrate
(in /Users/jones/Documents/myapp)
!!! The bundled mysql.rb driver has been removed from Rails 2.2. Please install the mysql gem and try again: gem install mysql.
rake aborted!
dlsym(0x1c71570, Init_mysql): symbol not found - /usr/local/lib/ruby/gems/1.8/gems/mysql-2.7/lib/mysql.bundle


After some poking around I found reference to the version of mysql that is installed.

I installed mine as the packaged disk image from mysql.com. For Mac OS X 10.5 you have the choice of these two:
mysql-5.0.67-osx10.5-x86.dmg
mysql-5.0.67-osx10.5-x86_64.dmg


Both work fine on my MacBook but the mysql gem wants the '_x86' version, not the '_x86_64'

Look in /usr/local to see which you have symlinked in:

$ ls -l /usr/local
[...]
lrwxr-xr-x 1 root wheel 24 Nov 24 09:35 mysql -> mysql-5.0.67-osx10.5-x86


Install the correct version, reinstall the gem just to be safe, and try your rake again - it should be fine!

IMPORTANT

If you do have the same issue and install the non _64 version on top of the _64 one then mysql will NOT copy over your data files. To fix this:

1: Shut down MySQL
2: In the new (non _64) version
$ sudo mv data data.bak
$ sudo cp -pr ../mysql-5.0.67-osx10.5-x86_64/data .

3: Start up MySQL

Wednesday, November 19, 2008

Wrapping Scripts with Platypus on Mac OS X

Platypus is a Mac OS X tool, written by Sveinbjorn Thordarson, that lets you create Mac OS X applications that wrap up command line scripts, create installers, etc. It is a great way to package Ruby, Perl, etc. scripts for a broader user audience.

Platypus is easy to figure out if you just have a single script. But I often times develop data analysis pipelines of various sorts where I have one master Ruby script calling a bunch of others.

Wrapping the whole thing up with Platypus can work really well but to do that you need to understand how the tool works.

At first glance you might think that it just calls the target script in its original location, but in fact it copies the script into the folder that embodies the Mac OS X application. So if I have a primary script calling a secondary one then I need to either refer to it by an absolute path, which is not a good idea for portability, or I need to include the secondary script in the Mac application.

Here is an example that shows how you handle this - I'll assume that you have tried Platypus with some simple scripts already.

The primary script (hello.rb) is what the Platypus generated application will execute. This in turn will execute world.rb and echo its output.

hello.rb

#!/usr/bin/ruby
puts "Hello ..."
world = File.join(File.dirname(__FILE__), 'world.rb')
puts `#{world}`


world.rb

#!/usr/bin/ruby
puts "... World"


First of all, specify the path to your Ruby (or whatever) interpreter explicitly. The convention in the Ruby world is to use '#!/usr/bin/env ruby' but doesn't work directly with Platypus. I'm sure you can mess with some of the Platypus options to make it work but I haven't bothered.

The third line in hello.rb specifies where to find the script 'world.rb', namely in the same directory as hello.rb. For Perl coders, this is the same as using FindBin.

Package hello.rb with Platypus, sending output to a Text window and keeping the window open after the script completes, then run the script. You should get something like this:

/Users/jones/Documents/Platypus/hello_world.app/Contents/Resources/script:7: command not found: /Users/jones/Documents/Platypus/hello_world.app/Contents/Resources/world.rb
Hello ...


It has found and run hello.rb but it can't find world.rb. The error message tells us that the script it has run is actually called 'script', not hello.rb, which seems odd.

Platypus copies your script into a new Mac application, which as you should know, is really a directory. 'cd' to the application directory (hello_world.app in my case), then into Contents -> Resources.

Look at the file 'script' and you'll see that is actually 'hello.rb'

To get the result you want you have to add 'world.rb' to the application as a 'Resource' and you do this in the 'Advanced Options' panel in Platypus. Just add the secondary file to this panel, rebuild the application and re-run it.

Hello ...
... World


With more complex applications you can add additional scripts and data files. Test things out by running them from the command line in the platypus generated application directory.

If you want to get around this you can specify an absolute path to the secondary script, or create a symlink to it from within the application directory. But down this path lies madness. A better alternative would be to use an environment variable to show where something lies, such as the PATH variable or something custom. Platypus will let you pass that into the app.

I'm very impressed with Platypus, compared to earlier attempts at doing this, like DropScript. When I get the chance I want to try it in combination with CocoaDialog which allows you to create simple GUIs in your scripts.


Tuesday, October 14, 2008

Memory Leak in Ruby 1.8.6

Just spent way too much time trying to track down a memory leak in a Ruby script that parsed out information from a massive text file. It would just gradually consume all available memory until the swap daemon went nuts.

I can't say that I've had a memory leak that caused me pain before using Ruby, but this one stopped the show.

None of the suggestions for possible causes was of any help, other than a general opinion that garbage collection in Ruby 1.8.6 had some issues.

So I downloaded ruby 1.8.7 (2008-08-11 patchlevel 72) from http://www.ruby-lang.org/en/downloads/, compiled it and reran the code.

Sorted!

1.8.6 is still the default on the systems I work with. If you run into this problem, the first thing to try is an upgrade to 1.8.7.

To help debugging, this line of code will spit out the current usage (Sorry, I forget where I got it from...)

STDERR.puts 'processBlock ' + `pmap #{Process.pid} | tail -1`[10,40].strip



Thursday, October 9, 2008

Tutorial on the Ruby gem aws-sdb - an interface to AWS SimpleDB

Tim Dysinger's aws-sdb is a Ruby interface to the SimpleDB service from Amazon Web Services.

I've written a tutorial
on how to use this interface to access AWS SimpleDB from your Ruby code.

 

Friday, September 19, 2008

Ruby Hpricot Tip - Extracting Arbitrary Blocks of HTML

Hpricot is a HTML parser for Ruby, written by 'whytheluckystiff', and is a great tool for extracting information from web pages.

If the target page uses divs with unique ids or classes then this task is especially easy, but most of the pages I care about are not as well designed as they might be. I often come across pages where the distinct sections are delimited by some arbitrary feature, such as a horizontal rule or simply a title in plain text.

Hpricot uses CSS selectors (as well as XPath) to pull out specific elements but that approach is not a great match for this class of arbitrary pages.

Here is one way to solve this problem. I've set up a simple web page with four sections, separated by <hr> tags. You can find that here and you can find the Ruby code to parse it here.

Basically you get the first Hpricot element on the page contained in the Body, then step through the elements in turn adding each to a new Hpricot::Elements object until either a hr tag or the end of the document is encountered. Every time it finds a delimiter it pushes the current Elements structure into an array and starts a new one.

Once done, you have an array of Hpricot::Elements objects, one for each section of your page. Each of these can be processed further using Hpricot.

The short version of the code, with comments removed, is here:
el = doc.search("body > *").first
blocks = Array.new
block = Hpricot::Elements.new
while el = el.next
if el.to_html =~ /\<hr/
blocks << block
block = Hpricot::Elements.new
end
block << el
end
blocks << block


Let me know if you have other solutions to this.

Wednesday, August 20, 2008

EC2, SSH and Capistrano

The various ways that you can set up SSH keys for secure remote access to a machine confuse the heck out of me.

Amazon Web Services use SSH keypairs for connecting to EC2 nodes, like this:
$ ssh -i mykeypair root@ec2-75-101-234-79.compute-1.amazonaws.com

But a more common way to use keys is to create a private/public key pair and copy the public key to the remote machine. The default location for storing these is ~/.ssh and the file name of the public key is id_rsa.pub. So to set up a key for ssh between two 'regular' machines you would do this:
$ ssh-keygen
$ cat ~/.ssh/id_rsa_pub | ssh user@yourdomain "cat >> .ssh/authorized_keys2"
$ ssh user@yourdomain


With EC2 nodes you have to use a 'keypair' and that involves a different type of private key and a different key stored on the remote host. You can find that on the EC2 node in ~/.ssh/authorized_keys -- NOTE the filename - this is not authorized_keys2 - the two versions relate to the SSH1 and SSH2 versions.

Using the EC2 flavor of SSH login is not a problem, until you want to use Capistrano, the powerful Ruby software for deploying Rails applications and other things on remote hosts. Capistrano uses SSH to connect to remote machines and by default will use the current user and the regular private/public keys.

Try to use Capistrano with its defaults to connect to an EC2 node and you'll get nowhere. To get it to work you need to do two things:

1: Set up a SSH private/public keypair as above and copy to the EC2 node, putting it in ~root/.ssh/authorized_keys2 (That's keys*2* !!). So you now have two keys for EC2.

2: Create a Capistrano capfile and include these two lines that tell it the remote user and where the key lives:
set :user, 'root'
ssh_options[:keys] = [File.join(ENV["HOME"], ".ssh", "id_rsa")]


Run Capistrano and everything should work.

You would think you could just use the EC2 keypair in the capfile but that did not work in my hands. Capistrano has minimal documentation but it looks like the SSH options are the same as those for the Ruby SSH library.

Now you still have to enter your SSH key passphrase. You can avoid that by registering your keys with ssh-agent, but that is another can of SSH worms...

Thursday, August 14, 2008

Common AWS EC2 mistake

Well, it's a common mistake for me...

You start a new instance in Amazon Web Services (AWS) Elastic Compute Cloud (EC2) using the ec2-run-instances command. You should include a keypair in the command line like this:
$ ec2-run-instances ami-2b5fba42 -k mykeypair

But if you forget to include the keypair an instance will still start up and appear in ec2-describe-instances. When you try to ssh into that node you get this error message:

$ ssh -i mykeypair root@ec2-75-101-234-79.compute-1.amazonaws.com
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that the RSA host key has just been changed.
The fingerprint for the RSA key sent by the remote host is
b0:b8:fa:f6:f2:c5:e8:2f:7b:9c:e8:44:b7:ff:a3:70.
Please contact your system administrator.
Add correct host key in /Users/jones/.ssh/known_hosts to get rid of this message.
Offending key in /Users/jones/.ssh/known_hosts:67
RSA host key for ec2-75-101-234-79.compute-1.amazonaws.com has changed and you have requested strict checking.
Host key verification failed.
lost connection

Confusing, until you realize your mistake...

You have to terminate that instance and then create a new one with a keypair.

Friday, June 27, 2008

Linking Github and Lighthouse

Github is a hosted source code repository for projects managed with the git SCM system.

Lighthouse is a hosted bug/issue tracking system.

You can link the two so that you can embed Lighthouse ticket numbers in your Git commit messages and automatically change the status of Lighthouse tickets when you commit.

The steps involved in linking the two applications are listed on the site but they are a little difficult to find. Here are the steps that I used to link the sites.

I'm using private repositories on github, but that shouldn't matter.

1: On Lighthouse, generate a token to use from github.

1.1: Go to 'My Profile' in the top right corner of each page. In the right hand side bar you will see 'Create a Token' and a 'choose an account' pulldown. Select the Account you want to send messages to - I just have one account.

1.2: A panel will appear asking you to enter a label - this will identify the token in a list on Lighthouse. I call mine something like myproject_github. Select 'full access' for github linkage as it has to update your Lighthouse database. And then select the Project that you want to link to. This can be a single project or all of them. You can have multiple tokens so I would recommend one token per project.

1.3: Create the token and a record will appear in the right hand panel with the project name, your label and a grayed out hexadecimal string.

2: On Github go to the linked repository in your account

2.1: In the light yellow panel, click on the 'edit' Button next to the repository name (not the other edit links)

2.2: Above the light yellow panel is a gray bar with 'General', 'Collaborators' and 'Services' on it. Click on Services.

2.3: The returned page will have multiple panels for Twitter, Lighthouse, etc. Scroll down to the Lighthouse one.

2.4: This panel has entry boxes for 'Subdomain', 'Project id' and 'Token'. This is confusing as these names don't match up with Lighthouse!

Subdomain is the same as your Lighthouse Account name. So if your Lighthouse URL is something like http://acme.lightouseapp.com then 'acme' is what you put in subdomain.

Project Id is NOT your Lighthouse project Name - it's the NUMERIC Project Id - when you go to your Project page on Lighthouse the URL will look something like:

http://acme.lighthouseapp.com/projects/12345-myproject/overview

Your Project Id is that number - in this case 12345

Token is that hexadecimal string you just created on Lighthouse. Cut and paste that into this entry box.

Check the 'active' box and click update settings.

3: Test it out...

3.1: For the purposes of testing, create a dummy ticket in your project on Lighthouse and make a note of the ticket number - let's say ours is #21

3.2: Make some change in your local clone of your github repository and commit that to your local repository, giving it a message that includes an embedded message for Lighthouse, and then push that up to github. For example:
$ git commit -m 'Made a trivial change [#21 state:resolved]'
$ git push


The text in the square brackets is the message for Lighthouse. The project is implicit in the token you created, #21 identifies the ticket and 'status:resolved' tells Lighthouse that this code change fixes that issue. The Lighthouse docs list out the full range of messages that can be passed.

3.3: Now go back to your Lighthouse project and refresh and you should see that ticket #21 is now marked as 'Resolved' - brilliant! Click on the Ticket message and you'll see a link back to your Github repository that details the fix.

3.4: Github does not seem to retain a link to the matching Lighthouse records - but I bet that will come in due course.

That's it...

Github and Lighthouse are both pretty nice sites. I think Lighthouse still has a few wrinkles but I'm sure those will get sorted out. Both sites need better documentation/help. New features are getting reported in their blogs before any documentation appears and the laudable goal of keeping the user interface simple has led to a few things being 'implicit' when they should be 'explicit' - but overall I think the services are great and they have already made a big impact in the way I interact with some of my clients.

Tuesday, June 17, 2008

Loading CSV data from Excel into MySQL on a Mac

Loading data from a CSV file into MySQL requires that your file has the same number of columns in the same order as the MySQL table.

That means that you need to populate any column with an auto incrementing primary key, as well as any timestamp column that might be generated by your application. This is particularly relevant if you want to populate a tables used in a Rails web application.

1: Build your table in Excel, adding a primary key column if necessary and populating it using the Edit->Fill->Series... feature in Excel.

2: Save it in CSV (Comma delimited) format - don't bother with the CSV Windows, etc options. Don't worry about Excel's warnings about losing formatting (unless of course you have important formatting...)

3: Optionally load it into an editor like TextMate (turn on 'show invisibles') and do any fine tuning you might want or need to do.

4: Run the Mysql client and from the mysql> prompt load in the data using a command like this:

mysql> load data local infile 'products.csv' into table products fields terminated by ',' lines terminated by '\r';

I needed the '\r' line termination loading the file from Mac OS X. It always takes me a couple of tries before I get the right \n and/or \r combination...

Using Older Versions of Rails

You want to develop with the latest version of Rails but you have an existing application that
uses an older version that you are not ready to bring up to date. What do you do?

By default gem will keep older versions of installed gems until you tell it not to. To see what you have installed:
$ gem list --local
[...]
rails (2.1.0, 1.2.3, 1.2.1)


To use a specific version in your application add a line like this to the bottom of your config/environment.rb file
RAILS_GEM_VERSION = '1.2.3'

That should just work.

You can get rid of old versions of gems with this command:
$ gem cleanup

If you remove an old version by mistake you can always reinstall it with this gem command:
$ sudo gem install rails --version 1.2.3

When you are ready to move your application to the current version of rails then remove the line from environment.rb and bring your application files up to date with:
$ rake rails:update

Now, let's say you have installed the current version of Rails (say, 2.1.0) but you need to build an application that uses an older version. According to 'rails --help' there is no way to specify a version to use. It turns out that there is a hidden option available that does this:
$ rails _1.2.6_ myapp

Simple! Why this isn't explicitly documented I don't know. I found out about it from this post: http://rubybook.ca/2008/08/06/downgrade-older-version-rails/. Obviously you need to have the older version of Rails available on your system.

Friday, June 6, 2008

Upgrading to Rails 2.1 on Mac OS X

I've just upgraded to Rails 2.1 on my Mac and ran into some problems.

I was upgrading from Rails 1.2.3 and gem 1.0.1 on a Mac OS X 10.5.2 system and I keep my own ruby installation in /usr/local instead of using the Apple supplied version. Rails installed just fine but running 'rails myapp' generated an error.

$ rails myapp
/usr/local/lib/ruby/site_ruby/1.8/rubygems.rb:379:in `report_activate_error': RubyGem version error: rake(0.7.1 not >= 0.8.1) (Gem::LoadError)

My version of gem is supposed to include any dependencies by default and in fact adding the --include-dependencies argument gives a message that it will be ignored. In reality it does not upgrade all the gems that rails depends on. You only find out which ones are missing when you run rails and even then you only reveal them one at a time. This is a mess. I find it hard to believe this bug slipped through the release process so maybe I'm doing something wrong. Please tell me if I am! Part of the problem may be that I jumped from Rails 1.2.3 to 2.1 without installing 2.0

Here is what I did to get Rails 2.1 up and running:
1: Update gem - to 1.1.1 in my case
$ sudo gem update --system

2: Update rails
$ sudo gem update rails

3: Update all of these:
$ sudo gem update rake
$ sudo gem update activesupport
$ sudo gem update activerecord
$ sudo gem update actionpack
$ sudo gem update actionmailer
$ sudo gem update activeresource

Note that I had to install activeresource as that is, I believe, new with Rails 2.0 and so I did not have it to update.

4: Check that you have everything you need by trying to create a dummy rails app
$ rails dummyapp

If that dumps out an error message then install or update that gem and try it again until it generates the application directory tree that you expect.

Thursday, June 5, 2008

Selenium IDE - Testing Web Applications in the Browser

Selenium is a suite of tools for testing web applications through web browsers. It mimics the user clicks, text entries, pulldowns, etc. and can be an important component of your testing arsenal. Selenium is a product of openqa.org.

This note walks you through the setup and basic operation of Selenium IDE, a plugin for Firefox that lets you record and playback tests of any web application. The plugin does not work with other browsers, but you can still use other components of the larger Selenium suite with IE, Safari, etc.

You can download the plugin from http://selenium-ide.openqa.org/download.jsp. Download it within Firefox and follow the installation and restart steps.

My setup uses Selenium IDE v1.0 Beta 1 in Firefox 2.0.0.14 on Mac OS X 10.5.2

The following examples use Google as the target site. The instructions will apply equally well to your own site.

1: Visit google.com in your Firefox browser and open up the IDE window by going to the Tools menu -> Selenium IDE

2: The IDE window allows you to record the browser events that you invoke as you type and click in the main browser window. In fact, the IDE starts out in Record mode so anything that you do in the browser will be echoed in the IDE.

3: In the main browser type in a Google query term, e.g. 'Craic Computing Tech Tips' then click the Google Search button. You'll get a page of search results and at the same time several lines will appear in the 'Table' panel of the IDE.

First you can see the 'Base URL' entry box at the top of the IDE now has the Google URL. Then in the table panel you will see 3 lines that show the actions you took in the browser.
  • open - you opened '/' relative to the base url, i.e. the Google home page
  • type - you typed 'craic computing tech tips' into the query box, named 'q' in the form
  • clickAndWait - you submitted the form through the 'btnG' button (which is the name Google uses in its form)
If you were to run these commands via the IDE you would recreate the steps that you just did manually.

4: Where Selenium gets interesting is the ability to create test assertions using the page that was returned.

On the Google results page you should see one or two links to this blog, followed by links to craic.com and to various pages with the word 'craic' in them.

Select the phrase 'craiccomputing.blogspot.com', right-click and look in the popup menu for the line 'assertTextPresent craiccomputing.blogspot.com'. Clicking on this adds a new line to IDE with a 'assertTextPresent' command. Running the IDE commands will now test whether that text is present in the page.

You can add multiple assertions to the test and the 'Show All Available Commands' in the right-click menu displays all your options. For now we'll just use this one assertion.

5: Click on the 'Craic Computing Tech Tips' link on the Google results page. You'll be directed to this blog and you'll see another clickAndWait line appeara in the IDE. Add another test by selecting 'Archive of Tips' in the blog page, right-clicking and assertTextPresent as before.

6: Stop the recording by going back to the IDE and click the red button in the right hand corner.

7: You now have a set of commands that will test out two linked web pages. You can rerun this 'Test Case' by clicking either of the two green arrows on the left of the IDE toolbar (leave the one in a square box for now).

Watch your Firefox window when you do this and you will see Selenium mimic what you did before. At the same time the lines in the IDE panel will become colored as each command completes. Assertions that are true are colored a darker shade of green. Any that fail will turn red. You'll also see a bunch of logging messages appearing in the lower panel of the IDE.

Seeing Selenium run complex tests can be pretty cool. If things are moving too quickly you can slow down the events using the 'Fast Slow' slider in the IDE.

8: You can save the Test Case from the IDE menu (File -> Save Test Case). Selenium tests are stored in HTML format so name your test something like 'google_test_1.html'. Take a look at the file to see the format, or click the 'Source' panel in the IDE.

9: Go back to Google's home page again and 'Shift Reload' it to get a blank query box.

10: Create a new test case (File -> New Test Case) and you will see a 'Test Case' list panel appear in the IDE, along with a blank Table panel.

11: Click the red Record button in the IDE, enter a search term in Google and build your own test on some other site like the one you just did, with one or more assertTextPresent commands.

12: Stop recording, run the test to make sure it works and save it to a file with a .html extension as before (e.g. google_test_2.html) in the same directory as the first test.

13: With two test cases we can now save a Test Suite that will allow us to run both of them. Test Suites are also html format files with table rows for each Test Case (File -> Save Test Suite). Remember to give this a .html extension and store it in the same directory as the Test Cases. Take a look at the file in an editor to see the format.

14: With a Test Suite you can now make use of another Selenium interface, the TestRunner. You invoke this by clicking the small green arrow in a square box in the IDE toolbar.

This replaces your main browser contents with the 'Selenium Functional Test Runner' which has 4 panels. In the top left is a list of the component test cases. Clicking on any of these will bring up the contents of the test case in the center panel. The top right panel contains buttons to run the tests and shows the summary results from doing so.

The left most button with a green arrow will run all tests, the next one will run an individual test case. Try running all of them and you will see the various pages appearing in the lower panel of the browser. Green lines and test indicate the tests passed, red indicates failure.

With these simple examples you should not have any failures but you can create one by editing one of the test cases and changing an assertion line to some random piece of text.

The format used for Test Cases is straightforward and the Selenium documentation will show you a wide range of assertions and commands that you can use in creating your own tests in either the IDE or by editing test files. But starting out I would suggest using the record feature of the IDE and walking through your own or other complex applications, perhaps with multiple linked pages, to get a good sense of how Selenium works.

Things can get complicated if you are testing a site with pages that refresh or involve Ajax. Selenium is capable of handling these but get familiar with it on relatively simple sites first. Google Maps would not be a good idea...

Also, be aware that you should type text into entry boxes. If you use completion pulldowns with previous values then Selenium is not able to capture that text, at least at the moment.

Once you've mastered the basic you should look into other aspects of Selenium that will allow you to test IE and Safari and to integrate it with other testing frameworks in Ruby, Rails, etc.

In summary, Selenium IDE is a great way to automate testing of web applications from the perspective that really matters to your users - from the web browser itself. It can save you a great deal of pain in your development process - plus it is a really impressive way to show your clients how thorough your testing process is.

Thanks to everyone involved in producing this amazing piece of code.

Wednesday, March 19, 2008

Ensuring a DSL Connection Stays Up

I've got DSL service through Qwest, with an Actiontec DSL modem. Every so often the connection will just hang for no apparent reason. Qwest have not been able to help. I suspect the problem is due to a flaky modem or some accumulation of 'state' in the device that causes it to hang when it reaches a certain point. Power cycling the modem fixes it every time.

Most of the time the connection is fine but I can almost guarantee that it will hang if I go away for a few days. As I run my own web site and mail server it is crucial that the connection stays up.

The fix that I've arrived at is very low tech. Simply plug the modem into a cheap programmable timer and set that to go off for one minute everyday sometime in the middle of the night. That way the modem gets reset once a day whether it needs it or not.

Most importantly it will reset when I'm not here. So even it hangs during the day it will be back within 24 hours.

Since I've been using this I've not had a problem... famous last words...

My timer is an Intermatic with an LCD display, but any reasonable timer from Home Depot, etc. will do the job. It cost less than $20.

Ruby unable to find installed Gems? You may have 2 installations of ruby

I've been bitten by this before but it has been long enough that I forgot about the issue until it tripped me up again just now.

You've installed a gem, then go to use it and your Ruby script complains it can't find it. It does so with a cryptic message along the lines of "in 'require__': no such file to load -- postgres (LoadError)" In this example it can't find the postgres gem, which I know is installed and which 'gem list --local' tells me is installed.

Why can't it see it?

It may be that you have two versions of Ruby installed on your system. Your OS may place one in /usr/bin and you may have installed a more recent version in /usr/local/bin. That's what I have on one of my machines.

The thing is that each version has its own directory of gems. If you can't see a gem that you know is installed then you might be using the wrong installation of Ruby!

This was my problem. I have /usr/local/bin at the front of my path so I automatically pick up that installation of ruby and gem on the command line.

I had mistakenly put #!/usr/bin/ruby at the top of a script. That led the script to look for 'require'd gems in the wrong gem directory, and hence it did not find it.

You'll see some folks use "#!/usr/bin/env ruby" as the first line of their scripts. This should find the right one as it uses your PATH to find the first instance of ruby.

It's a nasty gotcha and it might explain quite a few pleas for help on Ruby mailing lists.

Ruby, ActiveRecord and Postgresql

Here are some tips on getting Postgresql and Ruby working together. This has not been as simple as I had hoped but after some experimentation, here are the steps that worked for me...

My environment:
RedHat Linux (Enterprise Linux ES Release 4 - Nahant Update 3)
Ruby 1.8.6
Postgres 8.3 (?)

1: Downloaded Postgresql as source and compiled as directed, installed into /usr/local/pgsql
NOTE that I had to be root to get make to successfully compile the code - don't know why

2: Install the Ruby postgres Gem
You will see references to the postgres, ruby-postgres and postgres-rb gems. As far as I can tell, the one you want is 'postgres'
As root run:
# gem install postgres
This may very well blow up with error messages! It did with me. For whatever reason it does not know where to find the Postgresql include and lib directories.
Look at the error message and cd to the gem install directory. For me this was:
/usr/local/lib/ruby/gems/1.8/gems/postgres-0.7.9.2008.01.28/ext
Install the gem manually using these commands:
# ruby extconf.rb --with-pgsql-include=/usr/local/pgsql/include --with-pgsql-lib=/usr/local/pgsql/lib
# make
# gem install postgres -- --with-pgsql-include-dir=/usr/local/pgsql/include --with-pgsql-lib-dir=/usr/local/pgsql/lib
NOTE that the options are different for the ruby extconf.rb and gem install commands! They look very similar but the gem install options have a '-dir' suffix.

3: Create a test database with 'psql'
You will want to follow a basic tutorial on Postgresql and the client 'psql' to learn about this. They are plentiful and easy to find online.
For this example, let's assume that the database is called 'test', the table is called 'employees' (plural) and that it has two fields called 'id' (int) and 'name' (varchar(255)).
Populate that with a few rows of data.

4: Write a ruby script to interact with the database using ActiveRecord
ActiveRecord is the way that Rails interacts with databases, but you can use it outside of Rails with no problem.
Here is one that will fetch the 'employee' records from a database on the same machine as the script:

#!/usr/bin/env ruby
require 'rubygems'
require 'active_record'

# create a class for employee records (the class is singular but the table is plural)
class Employee < ActiveRecord::Base
end

# connect to the database
ActiveRecord::Base.establish_connection(:adapter => 'postgresql',
:host => 'localhost',
:username => 'postgres',
:database => 'test');

employees = Employee.find(:all)
employees.each do |employee|
print "#{employee.id} #{employee.name}\n"
end


NOTE the adapter is 'postgresql', not 'postgres', you don't need to explicitly require the 'postgres' gem, and the username is 'postgres'

To connect to a remote postgresql instance you will need to change the :host value in the establish_connection call. You may need to add the port that it is listening on (default 5432) with a :port directive.

You will also need to set up Postgres to allow remote calls. In /usr/local/pgsql/data/ you will need to edit pg_hba.conf to add a line allowing connections from a defined set of remote machines. Read the documentation to make sure you don't open up your database to the entire world.
Then you need to edit postgresql.conf and uncomment the 'listen_addresses' and 'port' lines. You want to add the IP address of the database server to the listen addresses so that the default line
listen_addresses = 'localhost'

becomes something like this
listen_addresses = 'localhost,192.168.2.100'


You'll need to restart Postgresql for these to take effect.

If you have trouble making a connection from your remote ruby script then try connecting using 'psql' on the database machine, if that works, try it from the remote machine. That should help pin down where the problem lies.

Friday, March 7, 2008

Ruby, WEBrick and CGI scripts

Ruby on Rails is the obvious choice when you want to create a web application using Ruby. But this can be overkill for some projects where a regular CGI script would do the job.

Unfortunately the focus on Rails has led to CGI scripts being somewhat neglected in Ruby documentation. Here are the steps you need to get a basic CGI script up and running and the wrinkle you need to know about if you want to upload files.

1: Setting up a test web server using WEBrick
Ruby comes bundled with the WEBrick web server library and this allows you to create a basic server on your local machine for testing. Startup a server with this simple script:
#!/usr/bin/env ruby
require 'webrick'
include WEBrick
s = HTTPServer.new(
:Port => 3000,
:DocumentRoot => File.join(Dir.pwd, "/html")
)
trap("INT") { s.shutdown }
s.start

This will start a server at 'http://localhost:3000' that will respond to requests for files in the 'html' subdirectory of your project.
Put that into a file called, for example 'webrick.rb', 'chmod a+x webrick.rb' to make it executable and run it from your project top level directory.

WEBrick can be used in various ways, such as to serve up Java like servlets. BUt this simple configuration allows you to serve static content and CGI scripts from the given directory with no other configuration required. Great for testing, but use Apache or something else substantial for hosting real sites.

2: Create a CGI script
For WEBrick to recognize your scripts, with this minimal configuration, you MUST give them the suffix '.cgi'
Here is a minimal script that will return the value for parameter 'foo' as supplied in a corresponding form.
#!/usr/bin/env ruby
require 'cgi'
cgi = CGI.new
foo = cgi['foo']
print "Content-type: text/plain\n\n"
print "foo: #{foo}\n"

The CGI object contains a hash of the parameters passed to it and with either a GET or POST request you can access these as shown above.

Things get more complicated when you want to upload a file from your form. In this case you specify enctype="multipart/form-data" on your html form, telling the CGI script that you are including a file among the parameters. BUT, with Ruby CGI, that affects the way you access other regular parameters that are included in the form.

You can no longer access their values by a direct hash lookup. cgi['foo'] no longer returns a string, instead it returns a StringIO object. StringIO wraps the IO class around the string so that you can use IO methods on it. This is great for handling the contents of the file that you want to upload but it makes handling regular arguments totally confusing until you realize the trick. Instead of accessing cgi['foo'] you now need to 'read' that StringIO stream and use cgi['foo'].read. The equivalent script as above for handling multipart data, and echoing the contents of an uploaded file is:
#!/usr/bin/env ruby
require 'cgi'
cgi = CGI.new
foo = cgi['foo']
print "Content-type: text/plain\n\n"
print "foo: #{foo}\n"
print "foo.read: #{foo.read}\n"
print cgi['filename'].read

The incorrect direct use of cgi['foo'] will print a reference to a StringIO object.

You can test these out with this static HTML form:
<html><head></head><body>
<p>Two example HTML forms for testing Ruby CGI scripts</p>
<hr>
Form with (method="post")
<form action="ruby_cgi_post.cgi" method="post">
Enter Value: <input type="text" name="foo"><br/>
<input type="submit">
</form>
<br/>
<hr>
Form with (method="post" enctype="multipart/form-data")
<form action="ruby_cgi_post_multipart.cgi" method="post" enctype="multipart/form-data">
Enter Value: <input type="text" name="foo"><br/>
Choose File: <input type="file" name="filename"><br/>
<input type="submit">
</form>
</body></html>

Tuesday, February 12, 2008

Mongrel Cluster on Mac OS X

Mongrel is a HTTP server that is well suited for serving Rails web applications, often in conjunction with Apache as a 'front end' server.

Mongrel works fine as is for development applications but if you have any real number of users its performance will degrade rapidly. In this case you want to use a cluster of mongrel servers, each running on a different port and have Apache balance the load between all of them

The mongrel_cluster gem is a convenient way to set up and manage multiple mongrels. HOWEVER I do not recommend this on Mac OS X. Because of the way Mac OS X handles processes at startup and/or because I couldn't figure it out, I was not able to get mongrel_cluster to work correctly at startup on my system. Some of the guides on the web that claimed to do this did not work for me. But don't worry... mongrel_cluster is really a simple utility and you can do without it at the price of a little more configuration. That is what I will show here.

These instructions relate to Mac OS X 10.4 and they assume that you already have Apache, Mongrel and your Rails application up and running.

1. Make sure that your Rails app is using Active Record to store Session data in the database, as opposed to storing it in files, which is the default. Instructions for that can be found HERE.

2. Setup a launchd plist file for each mongrel instance that you want to set up.
The preferred way to start programs automatically under Mac OS X is through launchd instead of the traditional init process in other Unix variants. launchd takes a bit of getting used to but you should use it (and don't try and mimic init scripts using StartupItems...)

You can learn about launchd is this Apple developer note and by doing a man launchd and man launchd.plist

3. In this example I am going to setup up 4 instances of mongrel on ports 8000, 8001, 8002 and 8003

In /Library/LaunchDaemons create the file net.mongrel80000.plist with contents similar to this:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/Prop
ertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>net.mongrel8000</string>
<key>ProgramArguments</key>
<array>
<string>/usr/local/bin/mongrel_rails</string>
<string>start</string>
<string>-e</string>
<string>development</string>
<string>-p</string>
<string>8000</string>
<string>-c</string>
<string>/Users/jones/myapp</string>
<string>-P</string>
<string>/Users/jones/myapp/tmp/pids/mongrel.8000.pid</string>
<string>-l</string>
<string>/Users/jones/myapp/log/mongrel.8000.log</string>
</array>
<key>RunAtLoad</key>
<true/>
<key>ServiceDescription</key>
<string>Mongrel Rails Application Server</string>
</dict>
</plist>
This ugly block of XML breaks down into the following components:
Label
This is a UNIQUE label for this launch item. In our case make this the same as the name of the file, less the '.plist' extension (net.mongrel8000)
ProgramArguments
An array of strings which, when joined together, create the command that you would run to start this instance of mongrel. Change the paths, etc to suit your application. Be VERY careful to set the port numbers to the one used in this file (8000 in this case)
RunAtLoad
This is set to true and tells launchd to run this item once when the system starts up.
ServiceDescription
An optional string that describes what this item represents.

chown/chmod this to have these permissions and ownership:
-rw-r--r--   1 root  wheel  836 Feb 19 14:53 net.mongrel8000.plist
4. Clone and modify this file for each mongrel instance
In my case I copied this into net.mongrel8001.plist, etc. and changed each instance of the port number 8000 to 8001, 8002, or 8003 as appropriate. These are marked in red in the above XML. Make absolutely sure the Label is correct and unique otherwise it won't work.

5. Test it out
Clean out any editor backup files in /Library/LaunchDaemons, check your file permissions and restart your machine. If things are setup correctly, when it restarts it will have started four instances of Mongrel that will drive your application.

Test these by using a browser on that machine and going to each port in turn, in other words these four URLs should all work:
http://localhost:8000
http://localhost:8001
http://localhost:8002
http://localhost:8003
Note that in one of the guides on the web that I saw they used a single .plist file and put the command strings for all the mongrel instances in a single ProgramArguments section. This did not work for me at all...

If my instructions don't work for you then double check the format of the XML file and your paths.

6. Configure Apache to direct requests to the Mongrel servers
Your Apache installation needs to have the mod_proxy_balancer module installed. 'httpd -l' will list the compiled in modules and hopefully you will find it there. You can find out how to compile in modules in Apache 2.2 and read my grumbles about that HERE.

Edit your Apache httpd.conf file to create a virtual host that it will respond to and a set of proxy and load balancing instructions. You can do this in various ways and get a lot more complex than this, but here is a single block that you can put at the bottom of httpd.conf.
<VirtualHost *:80>
ServerName myserver.craic.com

# Enable URL rewriting
RewriteEngine On

# Rewrite index to check for static pages
RewriteRule ^/$ /index.html [QSA]

# Rewrite to check for Rails cached page
RewriteRule ^([^.]+)$ $1.html [QSA]

# Redirect all non-static requests to cluster
RewriteCond %{DOCUMENT_ROOT}/%{REQUEST_FILENAME} !-f
RewriteRule ^/(.*)$ balancer://mongrel_cluster%{REQUEST_URI} [P,QSA,L]

# You could also add a bunch of deflate rules etc here

# This is the path to static content as opposed to your Rails app
DocumentRoot "/Users/jones/html"
<Directory "/Users/jones/html">
Options Indexes FollowSymLinks

AllowOverride None
Order allow,deny
Allow from all
</Directory>

</VirtualHost>

# This block tells the load balancer to pass requests
# onto these four mongrels

<Proxy balancer://mongrel_cluster>
BalancerMember http://127.0.0.1:8000
BalancerMember http://127.0.0.1:8001
BalancerMember http://127.0.0.1:8002
BalancerMember http://127.0.0.1:8003
</Proxy>
All the Rewrite lines tell Apache how to handle requests for static versus dynamic (from Rails) content. The important lines are the two that rewrite requests for non-static content into ones with a prefix of balancer://mongrel_cluster. These are passed to the <Proxy> block at the end where the load balancer distributes these among the members listed here. These members are the four mongrel instances that we set up earlier. Apache doesn't care that these are mongrel servers. It just sees them as URLs and efficiently passes on the requests.

Your Apache is most likely started by a launchd plist file. Assuming it starts automatically then any reboot of your server will start it and the mongrel instances.

Giving a URL like http://myserver.craic.com/myapp will now get forwarded to one of the mongrels.


There you have it... If you read this and have a better mongrel launchd configuration please let me know. There has to be a cleaner way than what I have here...

Monday, February 4, 2008

Using Active Record for Session Storage in Rails

This setup appears in all sorts of pages but I'm including the basic steps here for my own benefit - and hopefully yours...

The default way that Rails (1.2.3) stores session data is in files in your application tmp directory. This mechanism is referred to as CGI::Session::PStore. This is fine for development but becomes a problem as you move to a real production environment.

One problem with it is that unless you actively clean out old files with a cron job and 'rm' you can end up with a massive number of old session files.

It is also a problem if you use Apache and Mongrel to serve your application and want to scale things up with mongrel_cluster. Various resources warn of bad things happening with multiple mongrels and session files.

The next step up from files is to use a database table and have Active Record store session data in that. This is easy to setup.

1: Create a migration to set up the table and run that
$ rake db:sessions:create
exists db/migrate
create db/migrate/027_add_sessions.rb
$ rake db:migrate
== AddSessions: migrating =====================================================
-- create_table(:sessions)
-> 0.4298s
-- add_index(:sessions, :session_id)
-> 0.2914s
-- add_index(:sessions, :updated_at)
-> 0.0727s
== AddSessions: migrated (0.8001s) ============================================


2: Edit your app's config/environment.rb file and uncomment this line
config.action_controller.session_store = :active_record_store

3: Start your app server, interact with it and look in mysql
mysql> select * from sessions;

You will see a hexadecimal encoded session_id and a big block of encoded session data. Everything else should just work normally.

4: To avoid sessions accumulating in applications where you have users login and logout, you can call reset_session in the logout action - something like this:
  def logout
reset_session
flash[:notice] = "Logged out"
redirect_to :action => "index"
end


This clears out your current session row in the table and initializes a new session object. I'm not sure why it does the latter.

Archive of Tips