A collection of computer systems and programming tips that you may find useful.
Brought to you by Craic Computing LLC, a bioinformatics consulting company.

Thursday, March 21, 2013

SODA - Socrata Open Data API

Socrata is a company that helps organizations make their data available to outside users. They work with people like the World Bank and a number of city governments, including the City of Seattle.

You can access these public data sources via SODA - the Socrata Open Data API which provides a SQL-like query language (SoQL) that you submit via HTTP request to an API endpoint for that resource. The default response is JSON but you can get CSV, XML and RDF.

They provide quite a bit of documentation but it is a bit short on examples and I had to poke around a bit to get the right syntax for the data sources that I am interested in.

A very useful resource is the 'live' query console they provide for a test dataset, which lets you try various queries.

I access their resources from Ruby. I generate the URL, use open-uri to fetch the data and then parse the JSON response. Here is an example fo the process:

The data source I want to query list the Seattle Fire Department 911 calls. That page lets you create custom views and download the data. If you click on 'Export' you can see the column names used in the database as well as the url for the API endpoint.

I want to get all records in the past hour, so I need a query along the lines of 'where datetime > my_timestamp'. Specifically I need to build a URL similar to this:

http://data.seattle.gov/resource/kzjm-xkqj.json?$where datetime > my_time_stamp

This is not a real query just yet but note a couple of things...

The API endpoint is http://data.seattle.gov/resource/kzjm-xkqj.json

This version ends in '.json' and so it will return the response in JSON. If you want CSV then replace '.json' with '.csv' - same goes for XML and RDF.

Also note the dollar sign before the 'where' - it is easy to overlook - that signifies the following word is a SoQL clause and not the name of a column.

A big problem I ran into was figuring out the format for timestamps. The web page showing the table formats them as '03/21/2013 07:35:00 AM -0700' whereas the JSON response uses "datetime" : 1363872960. But neither of these works in a query... and the documentation on data types does not describe what you should use. Trial and error with that console page led me to the correct format of '2013-03-21 07:00:00' which is similar to the ISO 8601 format.

Also note that the time needs to be quoted... again, apparently not in the documentation

So my query should now look something like this:

http://data.seattle.gov/resource/kzjm-xkqj.json?$where datetime > '2013-03-21 07:00:00'

But that URL needs to be escaped before submission. That was the other issue - URI.escape() handles the spaces, quotes, etc but for some reason it does not escape the dollar sign. So I had to do that for myself.

Here is a chunk of Ruby code that gets all the 911 calls in the past hour:

require 'open-uri'
require 'time'
require 'json'

endpoint = 'http://data.seattle.gov/resource/kzjm-xkqj.json'
timestamp = (Time.now - (60 * 60)).strftime("%F %H:%M:%S")

query = "$where=datetime > '#{timestamp}'"

url = "#{endpoint}?#{query}"
url = URI.escape(url)
url.gsub!(/\$/, '%24')

puts "query: #{query}
puts "url: #{url}

json = open(url).read
puts json

result = JSON.parse(json)
puts result.length

That will show you the query, the escaped URL, the JSON returned and the number of records in the parsed JSON - enough to get you started. Different data sources will use different column names and perhaps different formats for date/time (always be aware of time zone issues).

Having a common API to many government/public data sources is great but the Socrata-backed sources that I've looked a so far don't seem to be quite as consistent as one would hope. This is an issue for the different data providers - it's not a Socrata issue.

I would hope that one could query the endpoint in some way and get a listing of all the column names and data types. There is one for SODA v1.0 but this now deprecated... so perhaps there is a newer version but it has not been documented.

Part of the answer lies in two custom HTTP response headers (X-SODA2-Fields and X-SODA2-Types) that are returned with your data. Here they are for the data source used here:

X-SODA2-Fields: [":updated_at","address","longitude",":id","latitude","incident_number","datetime","type",":created_at","report_location"]
X-SODA2-Types: ["meta_data","text","number","meta_data","number","text","date","text","meta_data","location"]
X-SODA2-Legacy-Types: true

You can see the headers if you use 'curl -i "your_url"' on the command line with your escaped url.

Friday, March 8, 2013

HTML5 and web API code examples

Whenever I am learning a new feature of HTML5, JavaScript/Jquery or a new API, such as Google Maps, I always look for example code that I can learn from.

In many cases the examples are great, but in others they can be too clever and too heavily styled, such that it can be hard to understand the core of the feature that they are demonstrating.

So in writing my own example code I try and strip things down to the bare minimum - very little styling and code that tries to do one, and only one, thing.

I've been collecting examples that I think have some substance and that can help others learn about a feature with the minimum of confusion.

Take a look at http://html5-examples.craic.com

This site has working examples of a number of HTML5-related features, including Data Attributes, Geolocation, Web Audio and Speech Recognition.

Look at the Source for each of these pages to see annotated JavaScript, etc. which illustrates the target feature of each page.

All the code is distributed freely under the terms of the MIT license and you are encouraged to build your own applications using it.

Note that some of the examples involve new web technologies, so they may not work in some browsers and what does work now may not work in the future as the APIs mature.

Archive of Tips