Wednesday, December 9, 2009

OGDI Python tutorial, or why OGDI is awesome.

The open government data initiative is a program started by Obama. It is so, that the US government can make easily available to the public, data that the public normally has the right to see, but it allows it to be accessed in a programmable way. It is sponsored by microsoft. And at first that sounds bad. Like to much information. And that microsoft has the data. But I gotta say this thing is shaping up to be pretty spiffy. And though it seems like it is more geared toward microsoft's .net, it has JSON and XML plugged into it as well, and grabbing the data with python is easy as pi.

Ok so the below, is an example of Juvenile Arrest Charges in DC if you just want to start with that use this url:
instead of$filter=gender%20eq%20'F

What I have done is add a query of gender eq 'F' to the filter which automagically filters the results. So now you only see juvenile's arrested that are Female.

import urllib
from xml.dom import minidom
from xml.dom.minidom import parse, parseString

def GetData():
url = "$filter=gender%20eq%20'F'"
xmldoc = minidom.parse(urllib.urlopen(url))
contentNodes = xmldoc.getElementsByTagName("content")
for contentNode in contentNodes:
partitionKeyNodes = contentNode.getElementsByTagName("d:PartitionKey")
for node in partitionKeyNodes:
print node.childNodes[0].nodeValue

rowKeyNodes = contentNode.getElementsByTagName("d:RowKey")
for node in rowKeyNodes:
print node.childNodes[0].nodeValue

timestampNodes = contentNode.getElementsByTagName("d:TimeStamp")
for node in timestampNodes:
print node.childNodes[0].nodeValue

entityidNodes = contentNode.getElementsByTagName("d:entityid")
for node in entityidNodes:
print node.childNodes[0].nodeValue

nameNodes = contentNode.getElementsByTagName("d:name")
for node in nameNodes:
print node.childNodes[0].nodeValue

addressNodes = contentNode.getElementsByTagName("d:address")
for node in addressNodes:
print node.childNodes[0].nodeValue

weburlNodes = contentNode.getElementsByTagName("d:weburl")
for node in weburlNodes:
print node.childNodes[0].nodeValue

gis_idNodes = contentNode.getElementsByTagName("d:gis_id")
for node in gis_idNodes:
print node.childNodes[0].nodeValue

genderNodes = contentNode.getElementsByTagName("d:gender")
for node in genderNodes:
print node.childNodes[0].nodeValue

offense_descriptionNodes = contentNode.getElementsByTagName("d:offensedescription")
for node in offense_descriptionNodes:
print node.childNodes[0].nodeValue

Your output will look something like this:


If you just want the output you can actually load a link up in excel. The website shows you how to do this with code samples in various .net languages, ruby, python, php etc..
The video and sample code on the site are very useful.

Information is only available in DC right now it seems...Though it seems while I was doing research on this there was a map showing michigan had finished as well as some other states...though I don't know how to get to that data...hmmm.

For more information there is a microsoft site here:, why is this the bees knees? Why am I so hyped about this? Everyone has those horror stories of how they didn't get a job because of drunk photos on facebook or something. I honestly believe that with public information sources like these, there will be so much crap on everybody. So much of bad things that people do...usually in the comfort of their own home. That people are going to forget this silly nonsense, of you did something I don't approve of, and I'm not going to hire you. I think with this really open sources of information a day will come when you don't have to be Jesus to get a job. I write a lot of shit down on the intarwebs, and post pictures etc... I know what posting my opinion or general malfeasentry can cost me. But in the end I think this big ol' goofy world is just going to have to accept imperfect big ol' goofy people.

This does make a bold assumption that things like people's personal arrest records will be available online.

Plus the world needs an app that pairs dates based on similar criminal interests, lulz.


Andy said...

Cool stuff! Access to such data is a good start, but we really need programmers to make better interfaces to the data for the general public.

Here's a version of the code that is a little shorter and easier to understand:

Woody said...

I think the opening up of all the general data will be great for statistics and news producers, and recognising that programmers can make good use of it is an excellent step forward.

Dan Knauss said...

Great lay-accessible explanation of something everyone involved in the public sector needs to understand.

I think you may also be more insightful than you realize with your (joking?) suggestion that a flood of open public data will become a shield of anonymity of the traditional sort--nobody will care about an individual in the crowd unless they're doing something truly worth the notice.

Yes, employers and other won't find social media dominating your public-accessible profile on the web. They won't look at your drunk pictures, they'll look more appropriately at crime, property records, credit reports. But we'll all become better at this, screening tenants, tenants screening landlords, checking out the prospective babysitter, etc.

We *are* building a surveillance state, in the private sector, by popular consent. It's a lot more than the tracking devices we all carry around and police are allowed by the cell services to use to apprehend people they're after.

But another way to look at it is to see a thicker public sphere, where you have more access to and control over records, yours and others'. It will be empowering to "open source" indie journalism and accountability/oversight. It's good for democracy, but it will be aggressive and scare people when your city council members are able to instantly see ALL court, police, tax, violation, permit, license, etc. records associated with a name and/or and address that appears in the agenda before them.