A collection of computer systems and programming tips that you may find useful.
Brought to you by Craic Computing LLC, a bioinformatics consulting company.

Wednesday, March 31, 2010

Script aliases in the style of git

The Git version control software lets you run its commands either through as a single program name followed by the command as an argument (such as git status), or as individual scripts (such as git-status).

The advantage of the second form is that you can use the text completion feature of your shell to save you some typing. It's a personal preference...

It is implemented by a series of symlinks from the longer command names to a single git executable. git itself gets the name of the executable it was called as and breaks that down to get the name of the command.

It is easy to create your own version of this. Here it is in Ruby...

The 'primary' script is called myscript.
#!/usr/bin/env ruby
script = File.basename($0)
if script =~ /^\S+?_(.*)$/
command = $1
command = ARGV.shift
puts "command #{command}"

if ARGV.length > 0
puts "args #{ARGV.join(', ')}"

Create an alias to it by adding a suffix (the sub command name), separated by some delimiter and symlinking this to the primary script:
$ ln -s myscript myscript_cmd_0
$ ln -s myscript myscript_cmd_1

The script looks at how it was called ($0 in Ruby) and sees if it can split off a command. If it does then it takes any other arguments as they are presented. If you call the script as the primary script followed by a separate command then it shifts ARGV to get the command. These examples show how it works.
$ ./myscript_cmd_0 foo bar
command cmd_0
args foo, bar
$ ./myscript_cmd_1 foo bar
command cmd_1
args foo, bar
$ ./myscript cmd_1 foo bar
command cmd_1
args foo, bar

You don't want to use a technique like this all the time - you end up with loads of symlinks in your bin directory, but in the right situation it can be very useful.


Thursday, March 18, 2010

Merging PDF Documents in Preview on Mac OS X Snow Leopard

Preview in Mac OS X is not only a viewer for PDF (and other) documents, it allows you to merge multiple PDF documents into one. This is useful for a lot of reasons, especially when you have scanned several pages of a documents into individual files and you want to combine them.

In Snow Leopard the way this works has changed and, as there is no menu item for merging, it can be a little confusing.

Open your first 'page' or document in Preview and open up the sidebar.

If you drag a new document into the sidebar and drop it in a blank region you will see that appear in the viewing window. But this has not added this page to the first. Preview is simply allowing you to view two separate documents.

To combine pages, drag and drop the second page on top of the first. The second page will appear as thumbnail in the sidebar below the first AND the two pages will appear in the same document in the main viewing window.

This is confusing as both scenarios look the same in the sidebar. You can see the true document structure in the sidebar by picking one of the pages and moving it slightly as though you were reordering it. All pages in the same document will become surrounded with a border and shaded background.

You can reorder pages within a document by dragging and dropping as needed and 'Save As' will save the merged document as a single file.

It is a great feature of Preview but the user interface means that it is effectively hidden unless you know about it.


Thursday, March 11, 2010

Raphaël Live

Raphaël is an amazing JavaScript Library for creating Vector Graphics in browsers. It was created by Dmitry Baranovskiy. It is goes further than HTML Canvas in that any object is accessible in the DOM and so can be made in to buttons, dragged around the canvas, etc. You need to know about it!

To help my exploration of the library I built a simple in-browser environment with a drawing canvas and the CodeMirror code editor so that I could try out Raphaël calls and see the results immediately. That worked out really well for me and so today I've released a more developed version of the tool, along with a range of code examples.

Raphaël Live allows you to load in code examples into the editor, run them, see the results, modify the attributes, etc., re-run them and thereby learn how to use the library.

The tool is freely distributed. You can use it on the craic.com site, or download you own version from GitHub.

Hope that you'll check it out...

Wednesday, March 10, 2010

Rails searchlogic and confusing column names

Searchlogic is a great Rails gem from Ben Johnson for adding model search capabilities to your Rails app with a minimum of effort.

You can build complex queries very easily such as Company.name_like_or_address_like.

But if you have column names that contain the Model name you can run into problems.

For examples, let's say my Company model has columns 'company_name' and 'company_address', then my query becomes:

It gets worse is my Person model has_one :company and I want to search companies through that model - now my query becomes

Not only is that ugly as sin, it may cause searchlogic to barf when you use it in a search form.

My experience is that it can handle ugly queries like this when run in script/console but for some reason they may fail in the context of a real Rails app.

So what can you do about it?

The best solution is to change the names of your columns to remove the model names, but that is not always possible, especially in legacy databases.

Failing that, you can create a named scope in your model that performs the same query but uses a shorter, more sensible name. Searchlogic will use your named scopes quite happily.

In my example I would create a named scope in the Company model like this:

named_scope :company_name_address_like, lambda { |name|
  { :conditions => ['company_name like ? or company_address like ?', "%#{name}%", "%#{name}%"] }

I can then call it as Company.company_name_address_like().

Don't be tempted to include 'or','and', etc. in your custom named scope names. Searchlogic appears to try and split up the scope into components based on these 'operator' words.


Nokogiri and Snow Leopard

I'm not alone in having problems installing the nokogiri ruby gem on a Mac that has been upgraded to Snow Leopard. The problem lies in the gem not being able to find a suitable version of libxml2, despite Snow Leopard having a recent version of that library installed. (Not sure if this is by default or only if you have the developer tools installed...).

People have tried various things but the place to look first is /opt/local/lib/xml2.*

If you have the libxml2 installed there then remove (or move) those files, along with /opt/local/libz.*

Try installing the gem again (sudo gem install nokogiri).

If it succeeds, you're good - if not then try removing the libxml2 include files in /opt/local/include - try again - and if still no luck then try trawling through other directories under /opt/local.

You should not have to specify the explicit libxml2 file locations with options to the gem install.


Wednesday, March 3, 2010

Git - push and pull between repositories

I've been trying to improve my git skills beyond the basics. But I ran into real confusion when trying to clone one repo into another and then trying to keep them in sync using git pull and git push.

In principle this should work fine but when I pushed changes back to the origin and then ran git status in the origin I would see the 'old' versions of files in the origin marked as having changed - no merge conflicts, just that they were not up to date.

Turns out that you can't (easily) keep two repos in sync where the origin is a regular working copy. What you need to do is create a third repo that is a so called 'bare' repository. Bare repos only contain the contents of the .git directory and their role is simply to track changes.

So here is a simple example of how to set this up:

(I'm just using repos on my local machine to keep things simple)

1: Create your original working directory under git and check your code in. I'll call this repo_original.

2: Clone this into the bare repo with 'git clone --bare repo_original repo_bare'
(Take a look at the contents of repo_bare - it's just like a .git directory)

3: Clone the bare repo into a working copy with 'git clone repo_bare repo_1'

4: Make a second working copy with 'git clone repo_bare repo_2'

5: Try making some changes in repo_1, commit them and push back to the origin (repo_bare) with 'git push'

6: Now go to repo_2 and pull from repo_bare with 'git pull'. Look at your files and you should see the changes you made in repo_1. Try making changes in repo_2, push them, go to repo_1, pull changes and see that they came across.

So far so good.

7: Make conflicting changes in the same file in repo_1 and repo_2 and commit in both repos.

8: Push the changes from repo_1

9: Push the changes from repo_2 and you should be rejected with something like this
$ git push
To /Users/jones/tips/repo_bare
! [rejected] master -> master (non-fast forward)
error: failed to push some refs to '/Users/jones/tips/repo_bare'

That's OK - there is a conflict but you can't resolve those on a bare repo, so it won't let you proceed.

10: Instead, on repo_2, pull the current origin and you will see something like this:
$ git pull
remote: Counting objects: 5, done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
From /Users/jones/tips/repo_bare
0d97e7a..494885c master -> origin/master
Auto-merged README
CONFLICT (content): Merge conflict in README
Automatic merge failed; fix conflicts and then commit the result.

That's what we want to see - there is a real conflict (we created it) and we can resolve it by editing the file and committing in the normal way.

So it's not difficult once you figure it out but it is not well explained in the guides that I've seen. Hope this helps.

Bonus Section - Branches

There are two ways to handle branching in repository setups like this - not sure of the official terms but I think of them as private and shared.

A 'private' branch is something I setup in my clone of a repo which will not get passed back to the origin repo. I create it, work in it and eventually merge it back into my repo's master. That branch will never get copied back to the origin and therefore any collaborators with their own repos will never know about it.

A 'shared' branch is one that I want to share with collaborators, so I need a way for them to access it. To do this, go to the origin repo, 'repo_bare' in the above example and create the branch there: 'git branch dev'. When you go to a cloned repo and do a pull you will see that new branch pulled over and 'git branch -a' will how it listed as 'origin/dev' but when you try to check it out you'll get an error.
$ git co dev
error: pathspec 'dev' did not match any file(s) known to git.
Did you forget to 'git add'?
What you need to do is track this branch in each of the cloned repos. To do this, in the cloned repo, run 'git branch --track dev origin/dev'. Now you can see a local branch called dev and you can checkout the branch. push and pull will now keep the tracked branch in sync as well as the master. You need to setup the tracked branch on each of the cloned repos in order to work with it.

What I haven't figured out yet is how to 'promote' a private branch in a cloned repo up to a shared branch. I suspect you have to use 'git stash' to stash the contents of that branch, delete the branch and then create a shared branch, track it and then put the stash back into it.


Monday, March 1, 2010

Apple Numbers and CSV Files

The Mac OS X spreadsheet program 'Numbers', from Apple and part of the iWork suite, is a competitor to Microsoft Excel. There are some things about it I prefer to Excel, others where I prefer Excel.

But one glaring omission in Numbers is that it will not open a Comma Separated Values (CSV) file from the Open menu. CSV files are a standard way to exchange spreadsheet datasets and not being able to load them into Numbers makes no sense.

In fact there is a way to do this by Drag and Dropping the file into a worksheet.

1: Your CSV file MUST have a .csv suffix - you will just copy the filename otherwise.
2: Drag and Drop the file into a single cell of an open worksheet - choose the cell that will take the 'top left' value of your dataset.
3: That's it - simple once you know the trick - seemingly impossible until you do...

Numbers will accept tab delimited files via the Open menu option with no problems.

Update: 2012-08-27
This restriction is no longer the case - Numbers will read .csv files from the Open menu just fine - don't when things changed

Archive of Tips