Craic Computing Tech Tips: 2011

Thursday, November 17, 2011

strsplit in R

The strsplit function in the R statistics package splits a string into a list of substrings based on the separator, just like split in Perl or Ruby. The object returned is a List, one of the core R object types. For example:
> a <- strsplit("x y z", ' ')
> a
[[1]]
[1] "x" "y" "z"
> class(a)
[1] "list"

If you are not that familiar with R, like me, the obvious way to access an element in the list will not work:
> a[1]
[[1]]
[1] "x" "y" "z"
> a[2]
[[1]]
NULL

So what do you do? There seem to be two options:

1: You can 'dereference' the element (for want of a better word) by using the multiple sets of brackets

> a[[1]][1]

[1] "x"

> a[[1]][2]

[1] "y"

... but I'm not going to write code that looks like that !!

2: You can unlist the List to create a Vector and then access elements directly

> b < unlist(a)

[1] FALSE FALSE FALSE

> b <- unlist(a)

> b

[1] "x" "y" "z"

> class(b)

[1] "character"

> b[1]

[1] "x"

> b[2]

[1] "y"

Much nicer !

Wednesday, November 16, 2011

Running R scripts on the Command Line

Using the R statistics package via a GUI is great for one off tasks or for learning the language, but for repeated tasks I want the ability to create and run scripts from the UNIX command line.

There are several ways to do this:

R CMD BATCH executes R code in a script file with output being sent to a file.
$ R CMD BATCH myscript.R myoutputfile
If no output file is given then the output goes to myscript.Rout. There is no way that I know of to have it go to STDOUT. Passing parameters to your script with this approach is a pain. Here is an example script:
args <- commandArgs(TRUE)
for (i in 1:length(args)){
print(args[i])
}

This is invoked with this command:
$ R CMD BATCH --no-save --no-restore --slave --no-timing "--args foo=1 bar='/my/path/filename'" myscript.R tmp
In particular, note the strange quoting of the arguments, preceded by --args inside the outer quotes - it's not a typo!

That command produces this output in file 'tmp':
[1] "foo=1"
[1] "bar='/my/path/filename'"
All those '--' options are necessary ! Try leaving out --slave and --no-timing and you'll see why.

Thankfully there is a better option ...

Rscript is an executable that is part of the standard installation

You can add a 'shebang' line to the file with your R script, invoking Rscript, make the file executable and run it directly, just like any other Perl, Python or Ruby script.

You don't need those extra options as they are the default for Rscript, and you pass command line options directly without any of that quoting nonsense.

Here is an example script:
#!/usr/bin/env Rscript
args <- commandArgs(TRUE)
for (i in 1:length(args)){
print(args[i])
}

Running this with arguments:

$ ./myscript.R foo bar

produces this output on STDOUT (which you can then redirect as you choose)

[1] "foo"
[1] "bar"

Much nicer - but we've still got those numeric prefixes. If you are passing the output to another progran these are a major pain.

The way to avoid those is to use cat() instead of print() - BUT you need to explicitly include the newline character as a separate argument to the cat() function

#!/usr/bin/env Rscript
args <- commandArgs(TRUE)
for (i in 1:length(args)){
cat(args[i], "\n")
}
results in this output:

$ ./myscript.R foo bar
foo
bar

For the sake of completeness, you can run R scripts with a shebang line that invokes R directly. But Rscript seems to be the best solution.

If you want to pass command line arguments as attribute pairs then you need to parse them out within your script. I haven't got this working in a general sense yet. What I want is to pass arguments like this:
$ ./myscript.R infile="foo" outfile='bar'
But I'm not quite there yet...

Deleting a File that starts with '-' on UNIX

Filenames that begin with 'special' characters like '-', '.' or '*' cause problems on Unix. Standard commands like ls or rm view the characters as signifying command options.

You don't typically create files with names like this but they can arise through errors such as cut and pasting text into your command line.

Simply escaping the character or quoting the filename does not work.

The solution is to use a longer path to the file - the easiest being a relative path to the same directory.

If the filename is '--myfile' you will get an error like this:

$ ls --myfile

ls: illegal option -- -

usage: ls [-ABCFGHLOPRSTUWabcdefghiklmnopqrstuwx1] [file ...]

But this works just fine:

$ ls ./--myfile

./--myfile

Plotting a simple bar plot in R

Here is my cheat sheet for loading data from a CSV file into the R statistics package, plotting one column of data as a bar plot and saving it as a PNG image.

My input file is simple CSV file with two columns :
Position,Entropy
1,0.2237
2,0.4051
3,0.1312
4,0.1312

[...]

I want to load this into R as a data frame, then plot the values in the second column (Entropy) as a bar plot, using the values in the first column as the labels for the bars.

First step is to use read.csv (a shortcut version of read.table)

> d <- read.csv('<your path>/myfile.csv')

I can use barplot directly on the data frame (d)

> barplot(d[,'Entropy'])

But the default plot options are not great, so I can add custom options to the call, such as a main title and labels for X and Y axes. I set the lower and upper limits for the Y axis to be 0.0 and 1.0 and use the values in the first column of the data frame as the labels for the bars on the X axis

> barplot(d[,'Entropy'], main="Entropy Plot", xlab="Position",
ylab="Entropy", ylim=c(0.0,1.0), names.arg=d[,'Position'])

The plot is displayed on my screen and looks the way I want it. To save it out to an image file, I specify the plotting device ('png') and the output filename, repeat the plot and then close/detach the plotting device.
> png('<your path>/myfile.png')
> barplot(d[,'Entropy'], main="Entropy Plot", xlab="Position",
ylab="Entropy", ylim=c(0.0,1.0), names.arg=d[,'Position'])
> dev.off()

This produces the following image:

There are endless configuration options to play with but this works for a quick, simple plot.

Here are the steps without the prompts for you to cut and paste as needed:

d <- read.csv('<your path>/myfile.csv')
png('<your path>/myfile.png')
barplot(d[,'Entropy'], main="Entropy Plot", xlab="Position", ylab="Entropy", ylim=c(0.0,1.0), names.arg=d[,'Position'])
dev.off()

You could put these into a text file and run it from your system command line like this:

$ R CMD BATCH myfile.R

R is an incredibly useful system but as an occasional user I find the syntax and command names/options hard to learn. Hopefully this simple example helps you with the learning curve.

... and always remember - arrays in R start at 1, not 0 ...

Monday, November 14, 2011

Captain Beefheart Song Titles

Here is a silly project that I knocked out last week - a generator of Fake Captain Beefheart Song Titles

It was inspired by a comment on Gideon Coe's radio show on BBC 6 Music where he wondered if such a thing existed - it didn't - so I wrote one!

The 'algorithm', if you can call it that, combines words from real Beefheart song titles and a list of others that sound (to me) like they could be. The structure of the real titles is relatively simple with most of them following a few simple patterns. Some of the generated titles don't work but every so often it'll spit out a good one.

The site is built with Ruby and Sinatra, with some jQuery thrown in for the scrolling effect. It serves as a nice example if you are learning Sinatra. You can find the code at Github and the application is hosted on Heroku.

It's just a bit of fun, but writing a small, self-contained application like this is a great way to learn new technologies. Giving yourself a short time frame in which to develop and deploy a complete application is a great exercise in efficiency.

Friday, November 11, 2011

When jQuery $.ajax() keeps returning the same value on Internet Explorer

I just mentioned this in my last post, but it is important enough that I'm going to give it its own post!

If you have jQuery Ajax calls, such as $.ajax() , $.get(), etc., that are continually fetching data from a remote server, you may find this works on Firefox, Chrome, etc. but in Internet Explorer it keeps returning the same value.

IE has cached the first return value and thinks that you are making identical requests so it just uses the cached value and never contacts the server.

Turn off caching by adding a call to ajaxSetup at the start of your script block and set cache to false.

<script>

$.ajaxSetup ({
cache: false
});
[...]
</script>

The fix is simple once you realize it but it took me a while to figure it out this morning.

Debugging JavaScript is painful at the best of times. It's worse when you throw AJAX into the mix. So take small steps and test on multiple platforms right from the start. I need to learn that lesson.

Updating a DIV with jQuery and Sinatra

Here is a minimal example of how to update a DIV in a web page with content from a remote Sinatra server using jQuery AJAX.

On the Server end you want an action that generates some new content and just returns it as a string. Let's just get the current time:

get '/update' do
Time.now.to_s
end

In the client web page you need to include the jQuery library:
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.7/jquery.min.js"></script>

You need a named DIV that will receive the content

<div id='update-div'></div>

Then you need to create your script block. You need a function that updates the DIV. update_time() calls the jQuery get function with two arguments. The first is the URL of the server action. The second is a function that is called on the returned data and the third is a type of data that the server will return, which is 'text' in our case.

Here we update the named DIV with the string returned from the server. Then we use the setTimeout function to wait 5 seconds, and then we call the function itself to repeat the process. That instance fetches content from the server, waits 5 seconds, calls itself, and so on.

At the start of the script block we add a call to ajaxSetup and set cache to false. This seems to be necessary for Internet Explorer which would otherwise cache the first value returned and keep using that instead of fetching new values.

Finally we make the initial call to the function when the document is first loaded and ready.
<script>

$.ajaxSetup ({
cache: false
});

$(document).ready(function() { update_time(); });

// fetch text from the server, wait 5 secs and repeat
function update_time() {
$.get("/update",
function(data) {
$("#update-div").html(data);
window.setTimeout(update_time, 5000);
},
'text');
}
</script>

The jQuery get function is a simplified wrapper around the ajax call.

That is all you need to have a DIV continually updated with remote content, in this case coming from Sinatra.

Latest version of jQuery on Google APIs

If a web page uses the jQuery library it usually makes sense to link to copy hosted on an external content delivery network (CDN). You don't have to keep an updated copy on your site, the client may already have this copy cached, and the CDN has high bandwidth connections. Google, Microsoft and jQuery themselves offer CDN hosted jQuery.

I use the Google version and call it with this line:

<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.7.0/jquery.min.js"></script>

That is fine, but what happens when a new version comes out?

In these cases some sites will offer a 'latest' url that automatically points to the most recent version.

Google does not do that - but they offer something similar, and which might be slightly better.

If I want version 1.7.0 and nothing else then I use the link given above.

If I want the latest version in the 1.7 series then I use:

And if I want the latest version in the 1 series then I use:

Note that you don't always want to use the most general form. New releases are not necessarily back compatible. But you have the choice.

And note that you no longer need to include type="text/javascript" in the script tag anymore - it's the default.

Thursday, November 10, 2011

Sorting on multiple String keys in Ruby

Ruby gives you two ways to sort arrays of 'complex' objects - sort and sort_by

Consider this array of hashes

array = [{:key0 => 'foo', :key1 => 'bar'},{:key0 => 'hot', :key1 => 'cold'}, ... ]

sort_by is the most compact form. To sort on :key0 in ascending order you write:

array.sort_by{ |a| a[:key0] }

sort is a little more verbose, but offers more flexibility:

array.sort{ |a, b| a[:key0] <=> b[:key0] }

If you want to sort strings in descending order then you switch the a and b assignments:

array.sort{ |a, b| b[:key0] <=> a[:key0] }

You can use sort_by with multiple keys if you include them in an array:

array.sort_by{ |a| [ a[:key0], a[:key1] }

But if you are using strings and you want one of the keys sorted in descending order then you need to use sort.

In this example the desired ordering is to first sort on :key0 in descending order and then on :key1 in ascending order:

array.sort do |a,b|
(b[:key0] <=> a[:key0]).nonzero? ||
(a[:key1] <=> b[:key1])
end

Not the most elegant or concise piece of code, but it does the job.

Wednesday, November 9, 2011

Manipulating Model/Table/Controller Names in Rails

There are some very useful methods that operate on the names of your models, tables and controllers in Rails. I use them infrequently enough that I have to relearn them every time - so I'm putting them in here:

Given a string with the name of a model (e.g. Project), how can I create an ActiveRecord call on the fly using that model?

str = 'Project'
project = (str.constantize).find(1)

How can I get the name of the corresponding controller ?

controller = str.tableize

Given a string with the name of a controller, how can I get the name of the Class/Model ?

str = 'sales_contact'
model_name = str.camelize

I the string is plural:

str = 'sales_contacts'
model_name = str.classify

How do I do the reverse operation?

str = 'SalesContact'
name = str.underscore

Parsing 96 / 384 well Plate Maps

In molecular biology and related fields, we use rectangular plastic plates with 96 or 384 wells for many purposes, allowing us to run many experiments in parallel. In particular, they are the standard format for tasks involving laboratory robotics, such as fluid dispensing, DNA sequencing, mass spectrometry, etc.

In setting up a plate of samples, it is common for a researcher to use Excel to create a 'plate map' - a table of 8 rows and 12 columns, for a 96 well plate, with each cell holding information on the sample in that well.

Converting this into a list of individual wells for input into an instrument or a piece of software should be straightforward but variability in manually created plate maps can be an issue. Extraneous text and the placement of the table in the spreadsheet are just two of them.

I have to deal with this problem every year or two and so I decided to write a utility script that implements this conversion and offers several options for the output file.

It is not intended as a complete solution for any one application. Rather it is a starting point that I can customize as needed.

I know other people have to deal with the same type of data, so I've made the script as general as possible and put it up on github under the MIT license. Please adapt this as you see fit for your needs.

The script can take CSV or Tab delimited files as input, containing 96 or 384 well plate maps. The output is a table with one well per row and the columns : plate, row, column, value. Options allow you to specify CSV or Tab delimited as the output format, whether to parse the input in row or column major order, and whether or not to output empty cells in the plate map.

The code is written in Ruby and you can find it at https://github.com/craic/plate_maps

Let me know if you find it useful

Wednesday, August 10, 2011

Unzipping multiple zip files on Unix

To unzip multiple .zip files with one command command on a UNIX system the obvious command to try is 'unzip *.zip' - but that doesn't work...

$ unzip *zip

Archive: ipab20080103_wk01.zip

caution: filename not matched: ipab20080110_wk02.zip

caution: filename not matched: ipab20080117_wk03.zip

[...]

UNIX is expanding the wildcard and passing that string to unzip, which thinks that you are asking for specific files from the zip archive given as the first argument.

You need to escape the wildcard with a backslash so that 'unzip' sees it and expands it correctly.

$ unzip \*.zip

Archive: ipab20080103_wk01.zip

inflating: ipab20080103.xml

inflating: ipab20080103lst.txt

inflating: ipab20080103rpt.html



Archive: ipab20080110_wk02.zip

inflating: ipab20080110.xml

inflating: ipab20080110lst.txt

inflating: ipab20080110rpt.html

[...]

Glad I stumbled on that - saves me a lot of time...

Tuesday, July 26, 2011

Default Auto Capitalization in HTML Forms with Safari on iPad2

Ran into a nasty gotcha testing out one of my web sites on an iPad2 with Safari.

By default the iPad will capitalize the first letter in a sentence. But with my application the login page requires an email address and these are typically in al lower case.

Safari forces the first letter of the email address into upper case, e.g. Jones@example.com - but this does not match the text in the application's databases (jones@example.com).

My app doesn't force text into lower case on its end (and it shouldn't !), which means that I could not login to my site. And in this case using the SHIFT key on the keyboard does not let you enter the first letter in lower case.

The solution is for the User of the iPad to go to Settings -> General -> Keyboard and set Auto-Capitalization to Off.

This is a really bad default setting - yes, it can be useful if you are entering regular text - but it's not that useful...

I can't expect my users to deal with this on their end. I could force all email address into lower case in my application and that is probably what I'll end up doing.

HTML5 forms can define that a text entry box represents an email address. This adds convenience when entering from a mobile device (the keyboard can display a '.com' key, for example). Whether or not using this has any effect on auto-capitalization I have yet to test.

Thursday, July 21, 2011

Firefox Security and file:/// URLs cause problems when testing HTML5

I'm experimenting with some of the great features in HTML5 and specifically with Web Storage. The way I typically try out things like this is to write a simple HTML page with the relevant JavaScript and CSS code and then open that in the browser.

With my Web Storage example, things work great in Chrome and Safari but nothing happens in Firefox 5 on the Mac - even though I know the browser supports this feature.

The issue lies in the security model behind Firefox. It is quite restrictive in what it will let you do with file:/// urls. The intention is to avoid any chance of a remote page being able to access a local file.

That is fine, but there is no obvious alert or notification that Firefox is doing this - instead your code just doesn't work. If I hadn't run it on Chrome first, my reaction would have been to question my own code.

So bear that in mind if you work the same way that I do - perhaps use Chrome as your first choice for development - and if something simple doesn't work, try it in a different browser before questioning your own code.

Friday, June 10, 2011

Rails meta_search - sorting on custom columns

I use Ernie Miller's excellent meta_search gem for searching the contents of tables in my Rails 3 Apps.

I have a lot of index views that are tables of data and each has a search form at the top of the page that uses meta search and the column headers use the sort links provided by meta_search to reorder the rows.

Typically I also have custom columns, such as the number of objects that are associated with a given row via some relationship. Some of those relationships can be quite complex and in some cases there is no simple way to express them in a way that works with meta_search.

I handle the sorting in the controller. It's outside the database but with my current application the amount of data makes this perfectly manageable.

I've come up with a way to let my custom column sorting coexist with meta_search and I reckon it might be useful to others.

All the code can be found at this gist on Github :https://gist.github.com/1019358

It works by adding another parameter called 'custom_sort' to the existing search[] parameters. The index action in the controller sees this, performs the custom sort and passes a custom_sort value to the view.

The view calls the custom_sort_helper with a set of arguments and that creates the column header link with the correct parameters to pass back to the controller.

So far it works well and allows me to sort on my custom columns with a set of rows selected by meta_search searches.

I'm sure the code could be more elegant - leave me a column and tell my how.

Tuesday, June 7, 2011

Rails send_data and UTF-8

I often use send_data to output plain text from a Rails action, typically as a simple way to download, say, a CSV format file.

I'll have something like this to output the text to the browser:

send_data(@output, :type => 'text/plain', :disposition => 'inline')

But if the text is UTF-8 and contains non-ASCII characters then you need to specify the character set.

Confusingly, to me, you don't do this with a separate argument at the Rails level. Instead you add it as a modifier on the :type argument like this:

send_data(@output, :type => 'text/plain; charset=utf-8', :disposition => 'inline')

Friday, June 3, 2011

Rails migration add_column and Postgresql

Just ran into a difference between MySQL and Postgresql when running a migration that added a new column to an existing table.

The intent of the original line was to add a new boolean column called 'active' and set the default value to true:

add_column :people, :active, :boolean, :default => true

This works fine on MySQL but with Postgresql the default value was not used on existing records. It appears that Postgresql sees that NULL is an acceptable value for the column and uses that as the value.

The way to fix this is to add a 'null' attribute to the call like this:

add_column :people, :active, :boolean, :default => true, :null => false

Now, running rake db:migrate sets all existing records to true

Google Chrome Browser, Integers and HTML5 forms

In HTML5 forms you can specify the type of input that is expected with the 'type' attribute.

For example, type = "number" specifies that the input tag expects a number. In general this is great as it permits client side validation, but it can have some side effects in some cases.

The specific case that is causing me problems occurs in the Google Chrome browser when it handles integers entered into an input element with type = "number".

Rather than simple displaying the integer, Chrome inserts commas into the number.

For example, 123456789 becomes 123,456,789

The commas are not included when the form is submitted but they are there if I cut and paste the text.

I don't want these - if I give it an integer all I want to see if the integer. But there is no readily accessible option in Chrome to disable this feature.

The workaround is to explicitly set the type to text (type = "text"). You lose the client side validation but you avoid the commas.

Now, I write Rails applications and use the simple_form gem to help create forms. This knows about the types of input each form input is going to accept and so it liberally uses the 'type' attrbutes. Fortunately you can override these as follows:

<%= f.input :myinteger, :input_html => { :type => 'text'} %>

The real solution, in my opinion, is for the browser to act very conservatively in interpreting user input. If the server wants the user to see commas then it should tell the browser explicitly.

Distinct Logins to a Rails App using Google Chrome Browser

If you are developing a Rails app that requires user login, then it can be really helpful to have more than one browser window open at a time, with a separate user logged into each.

For example, I typically have an Administrator user with special privileges as well as regular users. I want to test my app from both perspectives at the same time.

But because my app uses cookies to handle session information, I can't simply have two logins from the same browser. Until now I've dealt with this by having one user in Firefox and one in Safari, or Chrome.

Google Chrome allows you open Incognito Windows that store cookies, history, etc. in a sandbox that is destroyed when you close the window. This allows me to manage two active logins from Chrome.

Open up your first account in a normal window, then open an Incognito window and login as the second user... Simple!

Cookies, etc are shared between all Incognito windows and so you only get to have two active users, but even so this is really useful.

Firefox and other browsers have similar modes - Firefox has 'private browsing' - these may well give you similar functionality.

Wednesday, June 1, 2011

Craic Therapeutic Antibodies Database

Craic Computing LLC is pleased to announce the launch of the Tabs Therapeutic Antibody Database

Tabs is a unique database focused solely on Therapeutic Abs under development by the Biotechnology industry.

As of 1st June 2011, Tabs contains data on more than 950+ antibodies, targeting 400+ antigens, being developed by 300+ companies.

Antibody records are linked to a wide range of associated data including:

Patents
Papers
Clinical Trials
Antigens
Companies
Conditions/Indications
Regulatory Actions
Protein Sequences
Protein Structures
Press Releases
Development Timelines
Conference Abstracts

The database is intended for Biotechnology industry staff - especially those in R&D and in Business Development.

Research staff have direct access to relevant patents and papers for each antibody.

BizDev staff can see the big picture of developments against a target antigen across companies.

Users can define custom Antibody Sets and download data in to Excel, bioinformatics software and popular Reference management tools. Whenever new data related to Antibody Sets are added to the database, users can be alerted by email.

Tabs is offered as a web based subscription service to biotech companies. The annual subscription offers unlimited access to unlimited users for a given site.

Tabs can be evaluated with a 30 day Free Trial. Sign up for an account here.

Friday, April 1, 2011

Tips for poking around a Win XP machine

Thankfully I encounter Win XP PCs rarely enough that I forget how to find my way around them.

Here are few of the tricks that I am using as I try and recover a heavily infected PC for my in-laws.

1. Reboot the PC while holding down the F8 key will let you select Safe Mode, Safe Mode with Network and other non-standard boot modes.

2. Windows hides a lot of files from regular users. To reveal everything go to a Folder and then Tools, Folder Options and View.
In the list of options that are shown, you want to Check 'Hidden files and filders' -> 'Show hidden files and folders' and 'Display the contents of system folders' and you want to UnCheck 'Hide protected operating system files'

3. Start up a Command Shell either by the Start Menu -> Accessories -> Command Prompt, or Start Menu -> Run and enter cmd.exe.

The cmd shell is a basic DOS (yes, really) shell. Type 'help' for options. 'dir' lists the contents of a directory, 'cd' moves you around. 'del' deletes a file.

'cd' has a pseudo-auto-complete function. Enter 'cd "' (cd space double quote) and then tab will cycle through the options.

4. From the command shell:
'ipconfig' shows your IP address etc.
'netstat -am' shows the network services in use.

5. Ctrl-Alt-Del brings up a window with the running applications and processes, which is useful for spotting rogue processes.

I really hope that you, and more importantly I, never have to use this knowledge again...

Getting rid of McAfee antivirus products on a Win XP PC

McAfee antivirus products are widely installed on PCs. I'm sure they work fine but they have a reputation of being intrusive when you let your subscription lapse. The company is not alone in this. With a number of products you will get intrusive popups and warnings if you let your subscription lapse.

What you should be able to do is go to 'Add/Remove Programs' in your Control Panel and uninstall the software - just like most other professional software allows you to do.

For some reason, at least in their older products, McAfee has chosen to make this difficult. In order to remove most (not all) traces of McAfee from your system you should get the MPCR.exe program from McAfee and run it.

http://download.mcafee.com/products/licensed/cust_support_patches/MCPR.exe

Download this onto your PC and run it - it will take a while and it will popup black 'command prompt' windows while it runs, each with cryptic text indicating the individual scripts that are being executed. Just let it do it's thing.

Restart you machine when it is done.

When you are trying to fix an infected PC, these remnants of old antivirus software clutter up your PC's registry and other directories.

Hope this helps

Most efficient way to remove the XP Home Security Malware from a PC

My in-laws Win XP PC became infected with malware about a week ago. It was the XP Home Security malware that pops up windows warning you about being infected and appears to scan real files on your PC that it says are infected. In addition the browsers on the machine were hijacked such that you could get to Google OK but when you clicked on any other link you would be redirected to another seemingly random sites. It was a mess...

Over the past week I've put in at least 11 hours work on the problem and run up 50 miles of driving back and forth to their house. I still don't have a complete fix... I'm going to post a few insights here over the next few days but I wanted to start out with my advice if you have the same problem.

1: Unplug the PC's network cable
2: Reboot the machine and hold down the F8 key in oder to see the boot options menu
3: Select Safe Mode, or Safe Mode with Networking
4: Pull off your user files onto a USB flash drive or disk drive
5: Turn the machine off
6: Go out and buy a Mac

I'm not trying to be funny (even though it is April 1st) - this is really the most effective way of dealing with this problem.

The time and frustration involved in sorting out a mess like this is simply not worth it.

Cut your losses, go buy a nice new Mac - you'll love it and you won't that these problems

Friday, March 25, 2011

Rails 3 ActionMailer - defining default host

In Rails 3 ActionMailer is really easy to set up and use. Ryan Bates has a great Railscast on the topic that you should check out.

But I ran into one issue when creating emails that contain links back to my server.

Because you are sending an email to a remote machine, you need to include absolute URLs in your links.

You can define that in a config file but the example given in the Railscast is now deprecated, plus when I first tried it, my app was not picking it up and all my links were still relative.

You two things for this to work:

1: In your application.rb or environments/development.rb add a line like this in the configuration block, replacing craic.com with your domain

    config.action_mailer.default_url_options = { :host => "craic.com" }

The application.rb file applies to all environments. Putting this in development.rb or production.rb lets you use different hosts for each environment.

2: There are several ways to specify a url to pass into a link_to call in Rails - but only one seems to work.
Specify the urls for your links using <your_model>_url(<your_object>) - not <your_model>_path(<your_object>) and not url_for(...). I tend to use mymodel_path in links in regular web pages so I can't just cut and paste code into a mailer view.

Here is an example that works:

<%= link_to object.name, mymodel_url(object) %>

...produces...

<a href="http://craic.com/mymodels/1">My Object</a>

Friday, March 18, 2011

Tandem Select jQuery Plugin

A standard HTML multiple Select menu allows the user to select multiple options using Command-Click. But when the list of options is long, requiring the user to scroll, it becomes cumbersome and prone to errors.

An effective solution is to use two select menus in tandem, with buttons allowing options to be swapped between the two.

I'm pleased to announce the release of a new jQuery plugin that makes tandem selects simple to set up and easy to customize.

Tandem Select consists of a JavaScript function that leverages the jQuery library, a CSS file and template HTML code.

More information and live demos can be found on the Project Page and the software can be downloaded from the GitHub repository.

Monday, March 14, 2011

Adobe AIR Application Installer running flat out on MacBook

My MacBook (model 5,1 - few years old) running Mac OS X 10.6.5 has had a few incidents recently where it's responsiveness takes a nosedive. That typically means that some process is pegging the cpu at 100%.

I run into fairly frequent problems with Firefox, especially if a browser window is running Flash - say, for a video feed or an advert. Depending on the level of ambient noise in my office, I can even hear the fan and/or disk in the laptop running at full speed.

You can use the UNIX 'top' command in a shell to find out who the culprit is, plus other tools like 'lsof', etc. But I find the Apple desktop utility 'Activity Monitor' (in your Applications -> Utilities folder) is more useful. Open it up and select 'All Processes' in the select menu at the top of the window. Then click on the CPU tab down below and then on the cpu column to sort the processes in descending order based on % cpu utilization. Look for something that is up near 100 %.

If there is nothing striking then look for something with very high Real Mem and Virt Mem and check out the DiskActivity tab as you might have a process that is so memory hungry that it has to page data out to disk.

The example in the screenshot shown above is a straightforward CPU hog. It is the Adobe AIR Application Installer. I figure this is involved in installing the Adobe AIR runtime libraries that are used by Flash.

I have no idea why this is running in the first place - and why is it using 100% of my cpu? Flash and associated technologies can be great but they do seem to be around when I have problems...

If you have the same issue, just select the AIR Installer line in Activity Monitor and click the big red 'Quit Process' button.

Sorting Dates and Times in Ruby

Sorting an array of objects on a field that represents a time or date can be a bit tricky.

Take an example of a Rails ActiveRecord object with a standard 'created_at' column.

You can sort these with code like this:

myobjs.sort_by{|a| a.created_at}

This works but it is really sorting on the String representation of those dates. That may be good enough for some uses but not if you want to sort down to the minute and second and in particular, if you want to sort in correct descending order.

The best approach is to convert the date or time to the number of seconds since the epoch, which is an integer, and do a numeric sort on that.

A 'created_at' column has the class 'ActiveSupport::TimeWithZone' and this will output a String under most uses. Convert this to an integer with 'myobj.created_at.to_i' and then sort on that.

myobjs.sort_by{|a| a.created_at.to_i}

If you are working with Ruby Date objects, such as 'Date.today' this will not work. In most uses the Data will be converted to a string like this "Mon, 14 Mar 2011". If you try to convert the Date object directly to an integer with 'to_i' you will get an 'undefined method' error.

Here you need to explicitly convert to the epoch seconds format using 'strftime' and THEN convert that to an integer.

> today = Date.today
 => Mon, 14 Mar 2011
> today.strftime("%s").to_i
 => 1300060800

If you only have date strings, such as '2011-03-14', then you will need to convert these to Date objects using 'Date.parse' and then convert those to integers.

Working with dates and times can get messy but converting to integers is the best way to avoid complications.

Friday, March 4, 2011

simple-tooltip Ruby Gem for tooltip help

I'm pleased to announce the release of my first Ruby Gem: simple-tooltip

Simple Tooltip helps you make tooltip help available in your Rails 3 applications.

This is often done using a JavaScript plugin which produces a small popup window containing a short text string. That works great in many applications but often the help content that you want to provide is more extensive and you many want to include richer formatting, such as links to other web pages.

Simple Tooltip acts as a Rails Generator which will create a Tooltip model in your application along with a database table, a controller and views. You create help records using a web interface using plain text, html, Textile or Markdown formatting. The help records are stored in the database and retrieved using the unique, informative titles that you give them.

In your web pages you include a Tooltip helper that refers to the title of the relevant tooltip. Depending on how you set it up, when you hover over a tooltip icon or click on it, a popup window will appear displaying your help.

This is much richer and more flexible than the basic solutions available from JavaScript alone.

More than that, if the audience for your application is not just English-speaking, you can create multiple instances of each tooltip with content in each target language, identified by a two letter locale string. If a user has a non-English locale defined in your application, Simple Tooltip will look for content in the matching locale and display that. For content longer than single sentences, this is a more practical internationalization solution than I18n functionality built into Rails 3, which is more focused on shorter strings.

The gem is available at https://rubygems.org/gems/simple-tooltip.

The code is available at https://github.com/craic/simple_tooltip and is made freely available under the MIT license.

Tuesday, March 1, 2011

Passing Environment Variables to the nginx web server

I need to write this down for future reference...

With the Apache web server you can pass environment variables to Rails applications, etc by specifying them using SetEnv in your configuration file. But with nginx you don't do this - or at least you don't need to.

When you fire up the server, simply tell sudo to preserve your environment. Without that, sudo will reset all your variables. With it, they are available to nginx and your applications - nice and simple...

$ sudo -E /usr/local/sbin/nginx

Thursday, February 24, 2011

Rails, UTF-8 and Heroku

I've had problems with Ruby character encodings over the years, especially when pulling text with non-ASCII characters in from remote sites. I thought I had it mostly sorted but the past few days showed me that was not the case.

I have a MySQL database on my local machine and a Rails 3 app that pulls in text from remote sources, stores it in the database and does stuff with it. I am deploying the application on Heroku prior to public release. Heroku uses PostgreSQL exclusively.

I was under the belief that all the components in my system were set up to use UTF-8 encoding and therefore moving text with non-ASCII characters around should be fine. But in practice that was not the case - characters like 'α' that looked fine on my local machine showed up as 'Î±' on Heroku, etc. So clearly I was doing something wrong. Rather than go into all the gory details, this is the way to do it right...

Bottom-line: Make everything use UTF-8 explicitly ... EVERYTHING

0: Backup your current database!

1: MySQL
Here are the contents of my /etc/my.cnf file:

[mysqld]
datadir=/usr/local/mysql/data
bind-address = 127.0.0.1
character-set-server = utf8
max_allowed_packet = 32M
[client]
default-character-set = utf8
[mysql]
max_allowed_packet = 32M

Even if your tables are held in utf8, you should add these lines. You want the following mysql command to look as shown:

mysql> SHOW VARIABLES LIKE 'character\_set\_%';
+--------------------------+--------+
| Variable_name            | Value  |
+--------------------------+--------+
| character_set_client     | utf8   |
| character_set_connection | utf8   |
| character_set_database   | utf8   |
| character_set_filesystem | binary |
| character_set_results    | utf8   |
| character_set_server     | utf8   |
| character_set_system     | utf8   |
+--------------------------+--------+

2: Rails
I'm using Rails 3 - can't tell you how this works in Rails 2.x
a: In config/application.rb make sure this line is uncommented:

    # Configure the default encoding used in templates for Ruby 1.9.
    config.encoding = "utf-8"

MySQL and Rails use different variants of utf8/utf-8 - make sure you are using the right one. And note the comment above this line - this sets up utf-8 encoding for templates ONLY.
b: In your database.yml, specify the encoding for the databases - for example:

development:
  adapter:  mysql2
  host:     localhost
  encoding: utf8
  [...]

Here you are telling the database adapter that the database uses utf-8.
c: mysql2
Notice that I am using the mysql2 adapter instead of mysql. At this point (Feb 2011) the mysql gem is NOT encoding aware. Replace mysql with mysql2 in your Gemfile and run bundle.
d: In each Model that uses text add this line at the very top of the file:

# encoding: UTF-8

This tells Ruby that we're using utf-8 in this model. I don't see a way to set this at the application level so you have to have to add it in all relevant model .rb file. I also don't like defining something with a comment line. I can't see how to define this in, say, an irb interactive session.

With all those components in place, you should be all set. Try entering non-ASCII characters into a form - such as accented characters or greek/math symbols. These should be displayed correctly in the browser and in the mysql command line client.

With regards to Heroku, assuming you have your app already set up, you should be able to do a 'heroku db:push' to copy the database into PostgreSQL on Heroku and the characters should display correctly on the remote pages. You will see reference to using 'heroku db:push' with explicit database URLs that include an encoding option, such as '?encoding=utf8'. If your MySQL is set up correctly then this should be unnecessary.

A critical part of running apps on Heroku is the ability to pull the database back to your local database using 'heroku db:pull'. Before getting all my components set up with utf-8, this step failed for me. With everything using utf-8, and after adding the 'max_allowed_packet' lines to my my.cnf file, this process works fine.

But because I was working with data in before everything was truly utf-8, I had some instances of text in the database that had been incorrectly encoded - and thereby effectively corrupted. I could see what the 'corrupt' characters looked like and I knew what the correct versions should be. Because everything is now using utf-8 I could simply do a substitution on the text. For example:

str.sub!(/ÃŸ/, 'β')

I gathered up the character mappings that I needed (which were not may in my case) and wrote up a class method that I cloned in each model with the issue. I then ran those in the Rails console to correct the bad characters. The method is:

  def self.make_utf8_clean
    mappings = [  ['Î±', 'α'],
                  ['ÃŸ', 'β'],
                  ['Î²', 'β'],
                  ['â€™', '’'],
                  ['â€œ', '“'],
                  ['â€\u009D;', '”'],
                  ['â€', '”'],            
                  ['Ã¶', 'ö'],
                  ['Â®', '®']
              ]
    # Get the list of String columns
    columns = Array.new
    self.columns.each do |column|
      if column.type.to_s == 'string'
        columns << column
      end
    end
    
    # Go through each object -> column against all mappings
    self.all.each do |obj|
      columns.each do |column|
        mappings.each do |mapping|
          value = obj.attributes[column.name]
          if value =~ /#{mapping[0]}/
            s = value.gsub(/#{mapping[0]}/, "#{mapping[1]}")
            obj.update_attribute(column.name.to_sym, s)
          end      
        end
      end
    end
  end

This looks at your model and figures out which columns are of type String. It goes through all records and all character mappings, replacing text and updating the database as needed. Your mappings array could be much larger. There may be a better source of these, but this is a start.
You run this in a rails console like this:

Loading development environment (Rails 3.0.4)
ruby-1.9.2-p0 > YourModel.make_utf8_clean

It's a hack but it helped my 'fix' quite a few records that would have been a pain to recreate.

Character encodings are HARD - Yehuda Katz wrote a nice article on the issues. For most purposes (unless you work with Japanese text) UTF-8 is your best choice for encoding and so I'm using it exclusively. Java and Python both made the same choice and things are probably easier to set up in those worlds. Ruby has it's roots in Japan and so it is not surprising that it could not go down that path.

From now on, I'm going to make sure everything I touch is configured for UTF-8. There are fews reasons not to at this stage and it allows you to handle most languages.

Wednesday, February 23, 2011

Tables named with reserved words in MySQL

Trying to fix a string encoding issue in a MySQL database I realized that one of my tables was called the same as a MySQL reserved word - specifically I have a table called 'references'.

This was created from a Rails app and I have been using this with no problems for a few months, so you can use reserved words, or at least that one.

Problem is that direct SQL statements in the MySQL client like this don't work:

mysql> describe references;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that 
corresponds to your MySQL server version for the right syntax to use near 'references' at line 1

The solution is to prefix the table name with the database name like this:

mysql> describe mydb.references;
+------------------+--------------+------+-----+---------+----------------+
| Field            | Type         | Null | Key | Default | Extra          |
+------------------+--------------+------+-----+---------+----------------+
| id               | int(11)      | NO   | PRI | NULL    | auto_increment |

Tuesday, February 22, 2011

Setting up PostgreSQL on Mac OS X 10.6

I needed to set up PostgreSQL on my Mac in order to troubleshoot a problem with a Rails application.

Here are the steps that I followed:

1: Using Homebrew, install and build PostgreSQL
Homebrew will give you commands to create a Launch Agent that starts the server on a reboot

$ brew install postgresql
$ cp /usr/local/Cellar/postgresql/9.0.3/org.postgresql.postgres.plist ~/Library/LaunchAgents
$ launchctl load -w ~/Library/LaunchAgents/org.postgresql.postgres.plist

2: Setup PostgreSQL
You should look into setting administration user - I skipped that for my purposes

$ initdb /usr/local/var/postgres
The files belonging to this database system will be owned by user "jones".
This user must also own the server process.
[...]

3: Create a specific database and verify that it is running

$ createdb mydb
$ psql mydb
mydb-# select version();
mydb-# \h

4: Install the PostgreSQL ruby gem
You'll see different names for this gem - just use 'pg'. Using the ARCHFLAGS env variable is important. I did not need to specify the explicit path to the Homebrew installation of the PostgreSQL software.

$ env ARCHFLAGS="-arch x86_64" gem install pg

Monday, February 21, 2011

Problem with sqlite when installing the taps ruby gem on Mac OS X

Just worked my way through one of those installation problems where trying to install A fails because B is missing and trying to install B fails because... and so on.

1: Trying to push a database to the Heroku hosting service failed because I did not have the 'taps' gem installed
2: 'gem install taps' failed because it couldn't compile a component that interacts with sqlite databases. I'm not using sqlite but there is no way to skip this part of the code - hmm...
3: Mac OS X has sqlite installed by default (or at least with Xcode installed)
4: Explicitly telling taps where to find the lib and include files didn't fix it

So the problem lay in my sqlite installation

Lots of mentions of the issue on Google - some of which said to install a new version with MacPorts

In the past I have found MacPorts to be a very frustrating experience - just wanting to install a single library has led to literally hours of watching it fetching and installing all sorts of apparent dependencies. So I tend to just get the source code for whatever I want and compile it manually, but that can be a pain in and of itself.

I've heard very good things about Homebrew as a MacPorts replacement so I figured I should take a look.

1. Set up a 'staff' group so that you can install code without having to sudo everything
2. Get Homebrew
3. Install your library

$ sudo dscl /Local/Default -append /Groups/staff GroupMembership $USER
$ ruby -e "$(curl -fsSL https://gist.github.com/raw/323731/install_homebrew.rb)"
$ brew install sqlite

Simple... Brilliant... I'm sold...

Homebrew installs code in /usr/local/Cellar by default so I need to tell the ruby gems where to find that - and somewhere along the line I saw a sqlite ruby gem. So I figured I should install that first as a check that things are working before trying 'taps'.

$ gem install sqlite3 -- --with-sqlite3-include=/usr/local/Cellar/sqlite/3.7.5/include \
                                            --with-sqlite3-lib=/usr/local/Cellar/sqlite/3.7.5/lib
$ gem install taps

It worked...
You may see mention of a sqlite3-ruby gem. That is now called sqlite3 - it's the same thing.

And finally I can run '$ heroku db:push' and send my database to my Heroku app.

Phew...

Friday, February 18, 2011

Rails, Devise and custom User models

Devise is an excellent solution for user authentication in a Rails application.

Ryan Bates has done two great Railscasts episodes on Devise - #209 Introducing Devise and #210 Customizing Devise.

The default Devise configuration uses a simple sign up process - you give it an email address, a password and password confirmation. Follow the installation instructions and it should just all work.

But in my current application I need a bit more. I want the user to enter their first and last names, I want to assign a role and I want to link that individual user to a company account. Devise is capable of handling all this but the README on github doesn't really explain how and I, for one, get a bit nervous messing with the code of my authentication solution.

It turns out to be incredibly easy as long as you don't try and be too clever.

Be going into the steps given below, I recommend trying out a basic off the shelf Devise installation in a test application first just so you know that it works on your machine and you can see what files it creates, etc.

In these steps I'm going to use Devise with a User model that contains some custom fields.

1. Before creating your own User model, do a basic Devise installation into your app

$ gem install devise  # or in Rails 3 add it to your Gemfile, and 'bundle'
$ rails generate devise:install  # follow the instructions given
$ rails generate devise User
$ rails generate devise:views # this generates sign_in, etc views under 'app/views/devise' - not user!

2. Modify your User model by adding custom fields to attr_accessible
Here I'm adding :first_name, :last_name, :active, :role

attr_accessible :email, :password, :password_confirmation, :remember_me,
        :first_name, :last_name, :active, :role

Add your own validations, etc to the model. For testing, at least, require the presence of at least on of your custom fields.

3. Modify your Migration for the User table and run the migration
In my case I added these lines:

     t.string :first_name
      t.string :last_name
      t.boolean :active, :default => true
      t.string :role, :default => 'user'

Run the migration with 'rake db:migrate'

4. Modify your Sign_up form
This lives in app/views/devise/registrations/new.html.erb
NOTE: You can have Devise install its views under app/views/user but I prefer to keep Devise specific views in their own directory
Add fields to the form for your custom fields e.g.

<%= f.text_field :first_name %>
etc.

5. Try it out - sign up a new user
Go to the URL /users/sign_up
Your input to the custom fields should go into the database and any validations against custom fields which fail should give you the proper error messages and highlighting in the sign_up form
In my experience this 'just worked'

6. But the whole reason you want custom fields in the User model is to work directly with them...

For this you need a User controller and views. Devise does not give you either of these.
Either run a scaffold generator and skip the model or copy over another controller and set of views.
Now you have the regular set of actions for your model.
Go to /users and you should see the user(s) that you added, /users/1 will show you that user with whatever columns you choose to display.

In my case I display my custom fields and the email in my show and index actions and just ignore the rest of the Devise specific fields.

You want to be careful with the User new/create/edit/update actions. If you create a new user via that path then they will have no password, etc so you might want to remove new/create. The edit/update actions are useful if a user want to change their name and other 'profile' information, but don't mess with the Devise-specific fields via this route.

Basically, the Devise side of things and your custom User model can coexist quite happily. Make sure you don't mess with the fields that Devise requires and don't use the same column names.

I would also avoid using virtual attributes in the custom fields. I tried this and couldn't get it to work. Not a big deal for my case.

When I started integrating Devise into my app I had a sinking feeling that the custom fields would be a real problem. Quite the opposite - this turned out to be really easy.

Great kudos to the folks at Platforma Tec - Jose Valim and colleagues for a really nice piece of work.

Basic image rollover effect in jQuery

There are so many fancy image effects that you can write in jQuery that it is easy to overlook the basics. Here is basic image rollover script.

I have two images 'logo.png' and 'logo_highlight.png'. I want to display 'logo' by default and then replace it with 'logo_highlight' when I roll the mouse over it.

Here is my image tag in the html:

<img id="logo" src="images/logo.png" alt="Logo" />

And here is the script (assuming that you have the jQuery loaded)


<script>
$(document).ready(function() {
    $('#logo').hover(function(e) { 
  this.src = this.src.replace('logo', 'logo_highlight');
 },
    function(e) { 
  this.src = this.src.replace('logo_highlight', 'logo');
 });
});
</script>

The script attaches a 'hover' event handler to the DOM element with ID 'logo'. This has two functions that are applied when the mouse enters and leaves the element respectively.

On entry, the image 'src' attribute is updated. The new one is derived by replacing the string 'logo' in the filename of the original image with 'logo_highlight' in the new version. In other words, the image tag now sources the 'logo_highlight' image.

When the mouse leaves the element, the second function is executed and that replaces the highlighted image with the original.

Short and sweet...

Wednesday, February 16, 2011

Authentication in Mongo and Mongoid

Mongo has primitive authentication - just basic user/password authentication per database.

It's preferred mode of operation is no authentication in a trusted environment. That's fine, but it's not always possible. I want to run mongo on a Amazon EC2 node and access it from remote clients so I need to use authentication. On top of that, I already have the database running without authentication on a node.

Here are the steps you need to make the migration to a server with authentication...

1. Create an admin user on the database
Open up a mongo shell on the machine running the server

$ mongo
> use admin
> db.addUser("your_admin_user", "your_password")
> exit

2. Restart your Mongo server with --auth
It is CRITICAL that you restart with the --auth option. Users and passwords are simply ignored without this option.

$ mongod --auth

3. Set up database specific users

$ mongo
> use admin
> db.auth("your_admin_user", "your_password")
> show dbs
> use your_db
> db.addUser("your_db_user", "your_password")
> db.system.users.find()
> exit

4. Set up authenticated access from your application
I work in Ruby and use Mongoid as the Object Document Mapper to access Mongo. Mongoid, in turn uses the Ruby Mongo Driver. If you are using Mongoid outside of Rails then you will need a configuration block along the lines of this;


Mongoid.configure do |config|
  name = "your_db"
  config.database = Mongo::Connection.new.db(name)
  config.database.authenticate("your_db_user", "your_password")
end

Note that you are authenticating with the Ruby Mongo driver - not with Mongoid.

If you are working with Rails then you'll need to add username and password into your config/database.yml file. I see that the Devise authentication gem can work with Mongo to handle authentication of individual users but I've not explored that yet.

5. Clearly there is an issue having your password in plain text in your code
The bottom line is that you probably don't want to trust Mongo authentication for critical data. In that case, you really need to set up Mongo access in a secure environment and perhaps handle interfacing this with the outside work through a separate gateway application, say a Sinatra app that handles all authentication itself.

For my needs I have non-critical data - I just want to prevent access to arbitrary users (i.e. port scanning scripts) and only access from a few defined scripts on specific machines. So for now this will work for me.

With mongo authentication in place, how do you handle backing up and restoring the database?

On the machine hosting the server you can use these two variants of the dump and restore commands:

$ mongodump -d your_db -o . -u your_db_user -p your_password
[...]
$ mongorestore -u your_db_user -p your_password your_db

To work with all databases you would use the admin user

In order for someone to break into your database someone has to
1: Guess/crack your admin username and password
or
2: Guess/crack your specific database, your db username and password.

You have to evaluate the chances of this along with the value of the data in the database before going down this path.
You can also configure the database to use a non-standard port. There is no harm in this but it offers minimal to no additional security as many malicious scripts will scan across all ports on a machine looking for one that responds.

Caveat emptor...

Tuesday, February 15, 2011

Always index columns that you want to sort on in Mongo

I'm using Mongo as a non-relational database for a few projects. In general it's working out great. MySQL would work too but I like not having to explicitly create a database or run migrations. Plus I figure you can't really understand the strengths and weaknesses of a technology unless you build a real application with it.

I work in Ruby and use the MongoMapper and Mongoid Object Data Mappers to talk to Mongo.

One issue that I do not like is the requirement that you explicitly create an index for every column that you think you will want to sort on. If you don't then all the data gets loaded into memory for the sort and you get an error like this:

[...]/gems/mongo-1.2.1/lib/mongo/cursor.rb:86:in `next_document': 
too much data for sort() with no index (Mongo::OperationFailure)

And if you want to sort on two columns then you need an index on the combination of the two.

You can add indexes at any point - it takes some action but it's not that big a deal. But it doesn't 'just work'... in MySQL it does - an index might give you better performance but it doesn't blow up without one.

You'll hear people claim that the NoSQL databases are schema-free, giving you a lot of flexibility. I don't really buy that argument - in most applications you want a clear schema.

Where I do see the benefit is that, with NoSQL databases, your schema resides your Model - not in the DB itself - and that is where it belongs. When you want to change the schema you just change the Model - no database migrations - very flexible.

But, with Mongo at least, if you have to define indexes ahead of time in order to sort even relatively small numbers of objects then that nullifies some of that benefit.

Using Mongoid in Ruby applications outside of Rails

Mongoid and MongoMapper are two Ruby ODM (Object Document Mapper) gems for the Mongo database.

I've used both to a limited extent and they seem comparable for my needs. Mongoid seems to be getting a bit more traction than MongoMapper and it certainly has better docs.

My current project uses Mongo in a standalone Ruby application - no Rails in sight - but the docs are almost totally focused on Rails. Here is how you use Mongoid outside of Rails.

I'm storing relevant RSS entries in the database. My model looks something like this (heavily truncated):


class RssEntry
  include Mongoid::Document

  field :entry_id
  field :title
  field :authors, :type => Array
  field :timestamp, :type => Time

  index :timestamp
  index :title
end

Be sure and define your indexes carefully for fields that you want to search on, otherwise Mongo will run out of memory when searching even modest datasets. I see this as a weakness of the database. See important note on creating indexes below!

and the application looks a bit like this (edited):


#!/usr/bin/env ruby
require 'mongoid'
$:.unshift File.dirname(__FILE__)
require 'mongoid_test_model'

Mongoid.configure do |config|
  name = "mongoid_test_db"
  host = "localhost"
  port = 27017
  config.database = Mongo::Connection.new.db(name)
end
[...]
entry = RssEntry.create({
    :title => title,
    :entry_id => id,
    :authors => authors,
    :timestamp = Time.new
})

And if you are using the defaults of localhost and 27017 then you can leave those definitions out.

NOTE: Simply defining an index in your model is NOT enough. You have to explicitly create the index. When you use Mongoid with Rails it sets up a rake task so you can run 'rake db:create_indexes' but outside of that environment you need to do this yourself.

You'll want to write a simple script/rake task to set this up, in which you call create_indexes on EACH class in your model that uses Mongoid. For example:


#!/usr/bin/env ruby
require 'mongoid'
$:.unshift File.dirname(__FILE__)
require 'mongoid_test_model'

Mongoid.configure do |config|
  name = "mongoid_test"
  host = "localhost"
  port = 27017
  config.database = Mongo::Connection.new.db(name)
end

# Call on each of the relevant models
RssEntry.create_indexes()

Previously, you could specify auto-indexing within your models but this has now be deprecated or removed, so ignore any references to that.

Monday, February 14, 2011

Using Twitter for System Notifications

I finally figured out something that many, many people have been doing for quite a while - using Twitter as a way to deliver notification of system events.

Twitter is a great way to deliver short messages to many people via many forms of media and devices. The default is that any message is available to anyone in the world. But you can also configure a Twitter account to be private, requiring the owner to explicitly allow access to other users. In the extreme case the owner can deny access to none but him or her self.

Twitter handles all the messaging, all you need to do is have your server, web application or whatever, send a message to your private account whenever some event takes place. For example, I run long calculations on servers at work and I want to be notified when a job completes.

You can find a load of UNIX command line twitter clients and libraries in all the main languages. So finding or building a suitable client is straightforward.

I'll show you how to build a simple client in Ruby.

If you want to send tweets to a private account then you will need proper authentication credentials.

For this you need to use OAuth - username/password authentication has been deprecated.

1: Sign in to Twitter as the owner of the private account
2: Go to http://dev.twitter.com - you'll still be signed in
3: Click on 'Register an App' - now you're not really creating a new twitter application but pretend that you are - give it a name - and you want to select that it is a 'client' application and that it should have 'read write' access to the account.
4: Now go to 'your apps' and click on the new dummy app.
5: Scroll down and get the 'Consumer key' and 'Consumer secret' - you'll need these in your code.
6: Those are required for your application, but in addition you need a key and secret for the actual twitter account that you will want to write to.
7: On your app settings page, on the right sidebar, click on 'My Access Token' and get 'Access Token (oauth_token)' and 'Access Token Secret (oauth_token_secret)'.

Now we can write some code.

8: Get the 'twitter' Ruby gem

$ gem install twitter

9: Write a small ruby app. This simple example takes a message on the command line, configures the client with the FOUR OAuth tokens/strings and then updates the private twitter account with the message:

#!/usr/bin/env ruby
require 'twitter'
abort "Usage: #{$0} message" if ARGV.length == 0
# Hard-wired to my private twitter account
Twitter.configure do |config|
  config.consumer_key = 'your-app_key'
  config.consumer_secret = 'your_app_secret'
  config.oauth_token = 'your_account_token'
  config.oauth_token_secret = 'your_account_secret'
end
client.update(ARGV[0])

10: It's that simple...
11: chmod a+x your script and run it with a message - check your private twitter account and you should see it.

It's easy to think up (and code) custom notification scripts for this. As long as you have a network and as long as Twitter is up (OK, it has had some issues) then you don't need to worry about anything to do with distributing your messages. You can get them on your phone or your desktop, and you can leverage the work of others to display popup windows on your desktop, play tunes, flash lights, etc, etc.

Just remember that when you create your private Twitter account that you go into the settings and make sure that it is indeed set to private.

One extension that I've thought about is having my script take an optional URL, say pointing to the results from a computational run, and using a URL shorting service like http://bit.ly or http://goo.gl to let me include that in the tweet. Unfortunately none of the 'big name' services allow you to have private URLs so that might be a problem in some applications. But it's worth considering for some applications.

PATSY - a web service that makes patents easier to read

I've just launched PATSY - a new web service that reformats US patents to make them much easier to read than their original format.

The text of patents is typically very dense and difficult to read.

They are written as legal documents and inevitably this results in verbose and sometimes arcane text. Every component of invention will have all possible variants enumerated and this can result in sentences of ridiculous length with these variants delimited by commas. On top of that, the US patent office still prints patents as two narrow columns of text of each page - a format that might work in newspapers but which in technical patents is nonsensical.

The underlying problem is that the patent offices should define and enforce a modern way of text formatting that is both easy to read and easy to parse in software.

But as this is not likely to happen any time soon, I decided to write an application that reformats the text of patents into something more palatable.

You enter a patent number into PATSY and it fetches the web page from the US patent office web site. It scans the text and splits up paragraphs into component sentences. Furthermore it splits sub-sentences by punctuation such as semi-colons. Simply adding this spacing makes a big difference.

But PATSY goes much further. It highlights a series of phrases that are typically of interest - such as 'preferred embodiment' and 'SEQ ID NO'. It recognizes references to other patents and hyperlinks these to either their patent office site or to PATSY directly. In some cases, references to scientific publications can be identified and links are added that will take the user to the NIH PubMed site of abstracts, and from there the original publication can be accessed in most cases.

PATSY only works with US patents right now and some of its features are geared towards biotechnology patents. The text parsing is not perfect but even at this early stage in its development, it can really make dense blocks of text much easier to read. In cases where the result is unclear, you can click the head of each text block to see the original text before any processing.

While it is in this early stage, PATSY is completely free. If it turns out to be useful to a lot of people then I may offer it via subscription to heavy users, while retaining free access to occasional users.

Please try PATSY out and send me feedback at info @ craic.com.

Technical aspects:

PATSY is written in Ruby using Sinatra as a lightweight web application framework. It runs on Heroku which is a hosting service for Ruby web applications that sits atop Amazon web services. My steps involved in setting it up are described here.

My experience with Heroku for this application thus far has been great. They allow you to set up applications with limited resources at no charge. If and when PATSY starts to get some traction then I can scale it up by adding more of what they call 'dynos'. That will incur some cost but there is no commitment or up front payment, plus the process of scaling is incredibly easy.

Friday, February 11, 2011

JavaScript Bookmarklet that can create a New Window

The popup-blocking features of current browsers can be a problem if you are writing a JavaScript Bookmarklet that wants to open a new window. For example, I want to select text in an arbitrary window and then have a remote server operate on the text and return its results to a separate window, or tab. A bookmarklet is a great way to do this.

One approach to writing these is to make the bookmarklet simply call a JavaScript script on a remote server, which does the real work. This results in a simple bookmarklet and lets you perform arbitrary operations in the remote script.

But when the end result is the creation of a new browser window this approach will fail...

Modern browsers view this a potential exploit and will only allow the creation of new windows as the result of direct user interaction - i.e. the user clicks something.

All is not lost - it just means that you need to put all your code in the bookmarklet itself. This is messy but for many scripts this should not be a problem.

Here is my example. It gets the currently selected text and adds that to the URL of a remote service. It opens that URL in a new browser tab or window, depending on the specific parameters. If no text has been selected then it prompts to user to enter some. This first version will open the new page as a new tab in most browsers.

<a href="javascript:(function(){
// Get the current selection
var s = ''; 
if (window.getSelection) { 
s = window.getSelection(); 
} else if (document.getSelection) { 
s = document.getSelection(); 
} else if (document.selection) { 
s = document.selection.createRange().text; 
} 
// Prompt for input if no text selected
if (s == '') {
s = prompt('Enter your text:');
}
// Open the target URL in a new tab
if ((s != '') && (s != null)) {
window.open('http://example.com/yourapp?id=' + s);
}
})();">BOOKMARKLET</a>

You would want to remove the comments from the bookmarklet, but you don't need to strip the newlines or minify the code.

The default of most current browsers is to create new tabs instead of new windows. Users can set their preferences to override this but most will not.

You may want to force the creation of a separate window. Think this through carefully - it may annoy some users if you start generating loads of windows. In some cases it is appropriate. It used to be that you could force this by passing '_blank' as the name of the new window but this does not appear to work in all browsers. Instead you need to explicitly specify one or more window properties, like width and height.

This is a messy solution but it works. In my application I just replaced the window.open call with this form:

window.open('http://patsy.craic.com/patsy?id=' + s, '_blank', 
'height=600,width=1024,status=1,toolbar=1,directories=1,menubar=1,location=1');

The options string in the third argument specifies what the new window should look like. You may need to experiment with these. With Google Chrome on the Mac these do not give me the expected result - the address is not editable and there is no bookmarks bar. I also found that simply using 'status,toolbar,etc' without the '=1' did not work, although you will see this listed as a valid syntax.