A number of areas of data analysis are dominated by sophisticated algorithms, intensive computation and, in some cases, limited access to raw data. Examples include the analysis of satellite imagery, feature extraction from video, protein structure analysis and language processing.
I am interested in how simple, approximate methods can be used to extend the application of these technologies. While simple approaches will clearly not match the accuracy and resolution of complex methods, they can, by nature of their simplicity, be implemented and deployed more easily and hence more widely.
I have just posted the code for one of these projects to the Craic Github site. The application involves image processing of satellite images downloaded from Google Maps in order to estimate the number of temporary dwellings or shacks in the slums of Port-au-Prince in Haiti.
The satellite images of Port-au-Prince show large areas covered with small white, blue and rust colored squares. These are the slums of the city in areas such as St.Martin, Cite Soleil and others.
The blue features are very distinctive and most likely represent the ubiquitous blue plastic tarpaulins the you can find at any hardware store.
Using the Python interface to the wonderful OpenCV image processing library, I wrote a simple application that identifies blue features below a cutoff size and then computes their area. By fetching adjacent squares that tile across the city, and calculating the area of blue features, I can come up with an matrix showing the relative density of the slums.
The approach is undoubtedly simplistic (the code is only about 35 lines of Python) but it demonstrates how simple approaches can be successfully applied to complex and sophisticated types of data. You do not always need access to expensive commercial software and proprietary datasets in order to work in these fields.
The code is made freely available at https://github.com/craic/count_shelters and you can read my write up of the work so far, with example images HERE.
No comments:
Post a Comment