Parsing JSON and Passing Data Into a Table
30 Dec 2014Last week I talked about querying data from an open data API for use in my project, SaferNYC. In this post, I’ll talk about the results of those queries and how to deal with that data.
The Socrata Open Data API, like many APIs, returns data in the form of JSON. For all better purposes, it’s an array of hashes, with each hash, in this case, corresponding to a collision. Each key in the hash is a column, such as “date” or “latitude.” As an example, click here to run a query for information on pedestrians killed in November.
The incidents/hashes in the array can be accessed using .map
, and the value of each column can be accessed using Ruby hash syntax. Because the API only gives 1000 results (less than a month of collisions) per query, I realized early on that I needed to save the data to a database.
In this class, collisions
is the JSON (DataCollector
was the focus of my last post), and Incident.create
is being passed a hash containing the parameters for a new row in the table. The parse method simply does some minor cleaning of the data to prepare it to enter the table.
More serious work is not needed here because essential columns with null values are weeded out in the query (see my last post), and .to_i
and .to_f
eliminate null values in other important columns (by changing them to 0 or 0.0, which are appropriate values in this case). Incidents like the ones seen in your test query that have no location data will not even make it this far, so time will not be wasted attempting to parse those incidents.
There’s one more step before the data in the table is viable. Many (usually about two thirds) of the incidents that have sufficient location data are still lacking latitude and longitude, and those values are now 0.0 in the database. Fortunately, this problem is very easily solved using the Geocoder gem. Just a few lines of code, and the (hopefully) correct latitude and longitude will be entered into the database.
And finally, we’ll want to be able to easily query for incidents that occurred within a certain number of months and didn’t somehow fail geocoding and end up staying in the Gulf of Guinea.
So I’ve taken JSON returned from a query to the SODA API, parsed it, entered it into a database, and geocoded it, and can run a query to get back relevant, useful data points from my database. Next time, I’ll go over creating functional GeoJSON from these data points.