A collection of computer systems and programming tips that you may find useful.
Brought to you by Craic Computing LLC, a bioinformatics consulting company.

Monday, October 27, 2014

USPTO Patent data sets are now distributed by Reed Tech, not Google

For the past few years Google has been making US Patent Office datasets available for free on the sites:

and others.

But the sites have not been updated since around Oct 7th 2014 and no information about the delay was posted... I rely on these updates so that was a big problem.

After some poking around I stumbled on the fact that Reed Tech, a division of Lexis Nexis, is now distributing these datasets at no charge.

Fortunately the format of the zip files remains identical, at least for the ones I work with.

You can find the Patent Grant Bibliographic Text at http://patents.reedtech.com/pgrbbib.php and the Application Bibliographic Text at http://patents.reedtech.com/parbbib.php

At the time of writing these pages are up to date.

This is good news, but it would have been even better if Google had announced the transition a month or so in advance.

Monday, September 22, 2014

Trends in Human Antibody Development - Charts

I operate the TABS Therapeutic Antibody Database which help biotechnology companies working in the field of antibody development. TABS represents the most comprehensive resource in this field.

With all that data available, I have compiled summary statistics that show the growth of this area of biotech over the years. I have made charts of those trands and have made those freely available on the TABS database site. You can find all the charts HERE.

Here is an example showing the number of active projects per year

You can get a free 30 day trial account at TABS.

TV Eye - a way to view YouTube videos without all the clutter

When you view a video on youtube.com you get not only the video, but a bunch of suggested related videos and often times a lot of comments and other text. All this clutter gets in the way of what you want to do - simply watching the video.

So I wrote TV Eye, a simple service that embeds your desired video in a simple, plain web page, with none of the usual clutter.

To use, go to youtube.com and find the video that you want to watch. Copy the URL for the video and paste it into the form on TV Eye.

The service is really simple and the code is distributed freely under the temrs of the MIT license. You can get the code at Github at https://github.com/craic/tv_eye.

Friday, September 19, 2014

Installing Ruby in Docker Images

Docker is a great way to package applications with all their dependent libraries etc and then deploy them easily on various hosts. It builds on tools like Vagrant.

I am interested in using it to package Ruby applications built with Sinatra or Rails.

The preferred way to build a Docker Image is to write a Dockerfile that contains instructions that load an operating system, installs system packages, copies user code, etc.

I am building my Images on top of Ubuntu 14.04, the current Ubuntu Linux release. A problem with most of the Linux distributions is that the packages that install Ruby are often one or two releases behind. In this case the packaged Ruby is 1.9.3 and the current release if 2.1.2. In many cases this would not be a problem but if you want the latest version then you have do a bit more work and, specifically, compile Ruby from source.

Once you have the Dockerfile working then this all happens very smoothly but it took me a while to get all the pieces working together. So I wrote up two versions of a minimal Sinatra application along with the Dockerfiles needed to get them to work.

The code is on Github at https://github.com/craic/docker_sinatra_examples

The Docker Images are on DockerHub at

The first example installs the packaged Ruby (1.9.3)
The second compiles and installs Ruby 2.1.2 from source.

I hope these examples help you get up to speed with Docker quickly.

Clean up unused Docker Containers and Images

Docker is a great way to package applications with all their dependent libraries etc and then deploy them easily on various hosts. It builds on tools like Vagrant.

In Docker you create Images that contain your code, the OS, libraries, etc and Containers which are the instances of the Image which you actually run.

Docker does not do a good job cleaning up old Images and Containers and when you are doing a lot of developing this can become a problem. Various people have proposed ways to handle the issue. The best of the web posts that I have seen is http://blog.stefanxo.com/2014/02/clean-up-after-docker/

Here is a slight restating of those solutions which work for me on MacOSX.

To clean up non-running containers

$ docker ps -a --no-trunc | grep 'Exit' | awk '{print $1}' | xargs docker rm

This removes any containers where the Status contains the string Exit. I have seen containers with no status which perhaps have crashed or hung. You just have to remove these manually.

Once you have removed unused Containers you can then remove unused Images. If you try this the other way round you will get errors that the Images are still in use.

Unused Images look like this in the output of 'docker ps' - they are the ones with <none> as the repository and tag

craic/sinatra_example            v1                  f9d2702eb2f7        2 days ago          481.3 MB
<none>                           <none>              196afed3dded        2 days ago          378.5 MB
<none>                           <none>              56f86f0a985e        2 days ago          378.5 MB

<none>                           <none>              011f588e88db        2 days ago          586.8 MB

To remove them use this command

$ docker images -f dangling=true -q | xargs docker rmi

If you are using Dockerfiles to build your Images (recommended) then you can always rebuild an image should anything get removed accidentally.

Monday, March 10, 2014

Private Wi-Fi Networks, Raspberry Pi and Mac OS X Mavericks

I ran into two problems connecting to my Wi-Fi network recently - both turned out to be the result of the network being private and not broadcasting its SSID (Service Set Identifier) - the 'name' of the network.

My network runs on an Apple Time Capsule using WPA2 encryption. It has worked fine with other machines - MacBook, Win PC, Chromebook, iPhones...

#1 Getting W-Fi running on a Raspberry Pi

I was not able to get Wi-Fi running on a Raspberry Pi linux machine, despite trying a range of configurations that I found on the web.

When I changed the network to make its SSID public then it connected just fine with the basic configuration.

#2 Setting up a new MacBook Pro

When you start up a brand new Mac it walks you through several setup steps, including connecting to a network. The new MacBook Pro only has Wi-Fi so connecting to the network is essential. But in my case it did not see the private network and so I was unable to get through the regular setup.

The work around was to skip those steps until I got to the regular desktop. Then going into Preferences and Network and configuring the network manually I was able to get in.

I think that if the SSID were broadcast then I would not have had any problems.

The reason to keep an SSID private is to make it more difficult for someone to break into your network. If they don't know the network exists then they won't try and break in. In reality, however, it not that simple. A serious hacker will still be able to detect packets on all networks in the area including the 'hidden' ones and then attempt to break in. So a private network is a good idea but it does not offer great protection.

Monday, February 10, 2014

Using Ping to find machines on your network

The UNIX command ping is used to test if specific machines are active on a network

$ ping
PING ( 56 data bytes
64 bytes from icmp_seq=0 ttl=255 time=0.619 ms

I have been using ping for years but last week I found out that you can ping the broadcast address of a network (x.x.x.255) and see all the machines on that network that are configured to respond to ping:

$ ping
PING ( 56 data bytes
64 bytes from icmp_seq=0 ttl=64 time=0.071 ms
64 bytes from icmp_seq=0 ttl=255 time=0.534 ms
64 bytes from icmp_seq=0 ttl=255 time=0.544 ms

Really useful if you need the IP address for a machine that used DHCP to assign its address

A complementary command is arp -a 

This displays information of all the network interfaces for a machine and reports the interface, the IP address and the MAC address of each entry in the address resolution tables.

$ arp -a
? ( at b8:8d:12:5a:ad:77 on en0 ifscope [ethernet]
? ( at b8:8d:12:5a:ad:77 on en0 ifscope [ethernet]
? ( at 70:56:81:ad:b6:d1 on en0 ifscope [ethernet]
? ( at b8:8d:12:5a:ad:77 on en0 ifscope [ethernet]
? ( at 70:56:81:c5:fb:2d on en0 ifscope [ethernet]

Note that these addresses are not necessarily active. You can see link reachability information using arp -al

Archive of Tips