While FrackingData.org provides fracking and earthquake-centric datasets suitable for most any citizen-scientist or analyst to consume, it does so with a bend towards the generic. After the latest significant magnitude 5.0 earthquake to hit Oklahoma near the town of Cushing at or about 7:44 PM on November 6, 2016, as well as the magnitude 5.8 earthquake to hit 8 miles northeast of Pawnee on September 3, 2016, I’ve decided to apply my data analysis and mapping skills to making focused datasets concentrating on the State of Oklahoma’s earthquakes and underground injection wells.
When the State of Kansas experienced fracking-operation related earthquakes, reports were that they reduced the volume of wastewater injected into their underground disposal wells, whereas sources have reported that the State of Oklahoma initially changed not the volume of the wastewater disposed, but the depths at which it was injected. Therefore, it would seem beneficial if a review of the practices of both states were undertaken, as well as any accrued benefits.
With the preceding in mind, I posit the following:
- Pull all magnitude 0.0 and above earthquakes,from 1898 to present, from the NCEDC website.
- Reverse-geocode the aforementioned earthquakes, adding the country, state, county, and nearest city/village in the process.
- Locate the oil well location datasets for the states of Oklahoma and Kansas, if such are available.
- Locate the underground injection well datasets for the states of Oklahoma and Kansas, if such are available.
- Extract, transform, and load (ETL) the aforementioned oil and injection well datasets into a standardized layout suitable for singular and multiple state analyses.
- Locate the volume of wastewater injection datasets for both states, if such exist.
- ETL the volume of wastewater injection datasets into a standardized layout suitable for singular and multiple state analyses.
- Publish the subsequent datasets, along with the methodology used to ETL them, for use by the fracking data analysis community.
- Produce a step-by-step guide of the subsequent analyses, complete with SQL or source code, as an example of using the datasets for research.
- Submit my study(ies) to the State of Oklahoma as well as various media outlets for their use or commentary, whichever is more appropriate based upon the nature of the receiving entity.
Please note that I’ll post my progress on the bullet points listed above, as well as build out a “living document” of my adventures in doing so at FrackingData.org’s sister site: FrackingData.info.
In the meantime, I hope that no loss of life occurs due to the continued practice of wastewater injection into underground wells. That being said, given the State of Oklahoma’s economic dependence upon oil as a means of income and its reluctance to date in reining its activities, I fear that loss of life will be inevitable. “Loss of life” seems such an abstract phrase, especially when it appears in print, but given that I’ve experienced its direct effects more than once, I can assure all that might read this post that it is deeply personal and most certainly not abstract to those that encounter it on a first-hand basis.
7 November 2016
Microsoft Access, while not SQL-92 compliant, is a very popular database program suitable for analytical use by many people that don’t use R, SAS, or Tableau for analysis and reporting purposes.
Concerning FracFocus.org-related data, and back again by popular demand, FrackingData.org is now providing (see link below) a Microsoft Access database in “accdb” format containing various tables as follows:
- FracFocus.org-related tables
- Earthquakes-related tables
- NCEDC_earthquakes_reverse_geocoded (worldwide, 1898 to date, magnitude 0 and up)
- Toxicities-related tables
- Views utilizing the above tables
- Link(s) to Microsoft Access database(s), compressed with the 7-Zip program:
Henceforth, this database will be available on the same schedule as the CSV, SQLite, and PostgreSQL files and a page holding the latest link can be found on the FracFocus Data page of FrackingData.org’s site (link below):
10 June 2016
Making good on my previous promise, I have released the source code for the NCEDC-formatted earthquake CSV file reverse-geocoder, written in Python 3, on GitHub as both as “Gist” and as an Eclipse-PyDev project .
Each of the above links has a README file with instructions on its use, arguments, and dependencies.
I dedicate this project and Gist to those about to endure the dubious “benefits” of fracking operations in the United Kingdom.
7 June 2016
As I promised earlier, I’ve downloaded earthquakes from NCEDC’s web site (1898 to date), reverse-geocoded them via GeoNames and K-D Trees (thereby obtaining their country, state, county, and city/village values), archived the resulting files via 7-ZIP and uploaded both the CSV and SQLite datasets to:
I have authored a program in Python 3 that reverse-geocodes (via GeoNames and K-D Trees) the lat/longs into their respective countries, states, counties, and cities/villages. I will post a link to the open-source project shortly once I’ve vetted its license and repository. The program processes nearly 3 million rows in approximately 240 seconds.
One of the most vexing sets of data to make usable for a data analyst is the earthquake dataset available via the NCEDC search site. While the site returns results quickly enough to an anonymous FTP site, they do not contain any columns representing the country, state, county, or city. These columns are some of the most useful for analysis of questions such as: “Oklahoma now rivals and even exceeds California for the number of significant earthquakes?”
Believe it or not, the answer to the preceding question is “True,” especially when one can analyze reverse-geocode earthquakes using relatively simple SQL queries.
The difficulty was in the reverse-geocoding of the latitudes and longitudes to their respective countries, states, counties, and cities. Originally, I had authored a Java program that used various ESRI shape files and discerned to which administrative units a lat/long belonged. That is, if you wanted to wait 12 hours for it to run.
Given the long run time and inconvenience of obtaining the shape files, I declined to publish it except as source code with little, if any, explanation as to its operation and use. I just didn’t think it was suitable for public consumption yet, as the reverse-geocoding was only tediously repeatable. I knew there was a better way, and as of a few weeks ago, after some research, I authored a better mousetrap:
- a Python 3.4 script
- using the reverse-geocoder package
- which uses K-D trees
- and datasets from GeoNames
- reverse-geocoding 2.8 million rows in approximately 210 seconds
So, in the next few weeks, the Python script will be pushed to GitHub and the reverse-geocoded earthquake dataset to http://frackingdata.info/downloads. A posting or postings will be pushed when this is done.
How cool, from 12 hours to 210 seconds.
Finally, some progress…
12 April 2016