Making good on my previous promise, I have released the source code for the NCEDC-formatted earthquake CSV file reverse-geocoder, written in Python 3, on GitHub as both as “Gist” and as an Eclipse-PyDev project .
Each of the above links has a README file with instructions on its use, arguments, and dependencies.
I dedicate this project and Gist to those about to endure the dubious “benefits” of fracking operations in the United Kingdom.
7 June 2016
One of the most vexing sets of data to make usable for a data analyst is the earthquake dataset available via the NCEDC search site. While the site returns results quickly enough to an anonymous FTP site, they do not contain any columns representing the country, state, county, or city. These columns are some of the most useful for analysis of questions such as: “Oklahoma now rivals and even exceeds California for the number of significant earthquakes?”
Believe it or not, the answer to the preceding question is “True,” especially when one can analyze reverse-geocode earthquakes using relatively simple SQL queries.
The difficulty was in the reverse-geocoding of the latitudes and longitudes to their respective countries, states, counties, and cities. Originally, I had authored a Java program that used various ESRI shape files and discerned to which administrative units a lat/long belonged. That is, if you wanted to wait 12 hours for it to run.
Given the long run time and inconvenience of obtaining the shape files, I declined to publish it except as source code with little, if any, explanation as to its operation and use. I just didn’t think it was suitable for public consumption yet, as the reverse-geocoding was only tediously repeatable. I knew there was a better way, and as of a few weeks ago, after some research, I authored a better mousetrap:
- a Python 3.4 script
- using the reverse-geocoder package
- which uses K-D trees
- and datasets from GeoNames
- reverse-geocoding 2.8 million rows in approximately 210 seconds
So, in the next few weeks, the Python script will be pushed to GitHub and the reverse-geocoded earthquake dataset to http://frackingdata.info/downloads. A posting or postings will be pushed when this is done.
How cool, from 12 hours to 210 seconds.
Finally, some progress…
12 April 2016
Attempting to mash the earthquake and underground data into a cohesive user-interface has proved to be, to put it mildly, daunting. It was much easier to find sources of earthquake data than it was to find any source of well data with any fields relevant to my needs.
The earthquake data was relative easy to come by, for example I found the following sources:
I downloaded the entire earthquake dataset from the Advanced National Seismic System (ANSS) beginning in 1898 through the present day and imported the data into an Apache Lucene index. In short order I had a searchable earthquake index lacking but a few location-centric fields:
- The country in which the earthquake occurred.
- The state in which the earthquake occurred.
- The county in which the earthquake occurred.
In order to associate the above needed fields with the earthquake data, I downloaded two ESRI-formatted shapefiles from the National Atlas:
And one shapefile from Mapping Hacks:
I then wrote a Java program that would read each earthquake record and link it to its associated country, state, and county available from the respective shapefile of each. To do this I used a Java library at GeoTools-8.0-M3-bin.zip from GeoTools.org.
Well data was much more difficult to come by, especially with any fields relevant to my needs, for example:
- The type of well, for example “oil”, “gas”, “inj” (for injection) was available as data, just not available as a field upon which one could query. In other words, I could not query for just underground injection wells (“inj”).
- The date each well became active, let alone its filing date, was not available via the web interface.
- The location of each well, in latitude and longitude, was not available either.
- Given the lack of the above information, I didn’t even concern myself with the lack of well depth information.
As an exercise in personal fortitude, I downloaded the wells for each county in the State of Oklahoma from the Oklahoma Corporation Commission’s Well Data System into one Excel spreadsheet per county. I then wrote a Java program that read the well data within each county’s Excel spreadsheet and posted it to an Apache Lucene index. I then zipped the Apache Lucene index and pushed it a web site so that it could be queried and viewed using Apache Solr’s VelocityResponseWriter browser interface. The results of this effort can be viewed and queried here.
So, in concluding this post, I find the earthquake data adequate for my present needs but the well data lacking any useful date or location information to allow me to associate the earthquakes to the wells by either location or time. As I am a persistent researcher, my next post will detail my further attempts at locating and downloading well data.