A collection of computer systems and programming tips that you may find useful.
 
Brought to you by Craic Computing LLC, a bioinformatics consulting company.

Monday, February 14, 2011

PATSY - a web service that makes patents easier to read

I've just launched PATSY - a new web service that reformats US patents to make them much easier to read than their original format.

The text of patents is typically very dense and difficult to read.

They are written as legal documents and inevitably this results in verbose and sometimes arcane text. Every component of invention will have all possible variants enumerated and this can result in sentences of ridiculous length with these variants delimited by commas. On top of that, the US patent office still prints patents as two narrow columns of text of each page - a format that might work in newspapers but which in technical patents is nonsensical.

The underlying problem is that the patent offices should define and enforce a modern way of text formatting that is both easy to read and easy to parse in software.

But as this is not likely to happen any time soon, I decided to write an application that reformats the text of patents into something more palatable.

You enter a patent number into PATSY and it fetches the web page from the US patent office web site. It scans the text and splits up paragraphs into component sentences. Furthermore it splits sub-sentences by punctuation such as semi-colons. Simply adding this spacing makes a big difference.

But PATSY goes much further. It highlights a series of phrases that are typically of interest - such as 'preferred embodiment' and 'SEQ ID NO'. It recognizes references to other patents and hyperlinks these to either their patent office site or to PATSY directly. In some cases, references to scientific publications can be identified and links are added that will take the user to the NIH PubMed site of abstracts, and from there the original publication can be accessed in most cases.

PATSY only works with US patents right now and some of its features are geared towards biotechnology patents. The text parsing is not perfect but even at this early stage in its development, it can really make dense blocks of text much easier to read. In cases where the result is unclear, you can click the head of each text block to see the original text before any processing.

While it is in this early stage, PATSY is completely free. If it turns out to be useful to a lot of people then I may offer it via subscription to heavy users, while retaining free access to occasional users.

Please try PATSY out and send me feedback at info @ craic.com.

Technical aspects:

PATSY is written in Ruby using Sinatra as a lightweight web application framework. It runs on Heroku which is a hosting service for Ruby web applications that sits atop Amazon web services. My steps involved in setting it up are described here.

My experience with Heroku for this application thus far has been great. They allow you to set up applications with limited resources at no charge. If and when PATSY starts to get some traction then I can scale it up by adding more of what they call 'dynos'. That will incur some cost but there is no commitment or up front payment, plus the process of scaling is incredibly easy.



 

No comments:

Archive of Tips