Category Archives: Studies

Fracking quakes-It’s time for a focused analysis of the recent temblors in Oklahoma

While FrackingData.org provides fracking and earthquake-centric datasets suitable for most any citizen-scientist or analyst to consume, it does so with a bend towards the generic.  After the latest significant magnitude 5.0 earthquake to hit Oklahoma near the town of Cushing at or about 7:44 PM on November 6, 2016, as well as the magnitude 5.8 earthquake to hit 8 miles northeast of Pawnee on September 3, 2016, I’ve decided to apply my data analysis and mapping skills to making focused datasets concentrating on the State of Oklahoma’s earthquakes and underground injection wells.

When the State of Kansas experienced fracking-operation related earthquakes, reports were that they reduced the volume of wastewater injected into their underground disposal wells, whereas sources have reported that the State of Oklahoma initially changed not the volume of the wastewater disposed, but the depths at which it was injected.  Therefore, it would seem beneficial if a review of the practices of both states were undertaken, as well as any accrued benefits.

With the preceding in mind, I posit the following:

  • Pull all magnitude 0.0 and above earthquakes,from 1898 to present, from the NCEDC website.
  • Reverse-geocode the aforementioned earthquakes, adding the country, state, county, and nearest city/village in the process.
  • Locate the oil well location datasets for the states of Oklahoma and Kansas, if such are available.
  • Locate the underground injection well datasets for the states of Oklahoma and Kansas, if such are available.
  • Extract, transform, and load (ETL) the aforementioned oil and injection well datasets into a standardized layout suitable for singular and multiple state analyses.
  • Locate the volume of wastewater injection datasets for both states, if such exist.
  • ETL the volume of wastewater injection datasets into a standardized layout suitable for singular and multiple state analyses.
  • Publish the subsequent datasets, along with the methodology used to ETL them, for use by the fracking data analysis community.
  • Produce a step-by-step guide of the subsequent analyses, complete with SQL or source code, as an example of using the datasets for research.
  • Submit my study(ies) to the State of Oklahoma as well as various media outlets for their use or commentary, whichever is more appropriate based upon the nature of the receiving entity.

Please note that I’ll post my progress on the bullet points listed above, as well as build out a “living document” of my adventures in doing so at FrackingData.org’s sister site: FrackingData.info.

In the meantime, I hope that no loss of life occurs due to the continued practice of wastewater injection into underground wells.  That being said, given the State of Oklahoma’s economic dependence upon oil as a means of income and its reluctance to date in reining its activities, I fear that loss of life will be inevitable.  “Loss of life” seems such an abstract phrase, especially when it appears in print, but given that I’ve experienced its direct effects more than once, I can assure all that might read this post that it is deeply personal and most certainly not abstract to those that encounter it on a first-hand basis.

Khepry Quixote
7 November 2016

Preliminary Analysis of FracFocus.org Data

Abstract

This post covers the oil/gas well and chemical data extracted from the FracFocus.org website by the staff of SkyTruth.org.  The FracFocus.org website is gradually becoming the default repository for general hydro-fracturing well information as well as the chemicals used on each well during the fracturing process.  The “findings” I present will be sample  Structured Query Language (SQL) queries of a substantial subset of the extracted FracFocus.org data (800,000+ records).

FracFocus.org

When I first saw FracFocus.org’s website, I thought I had found an answer to my oil and gas well location information difficulties, as well of chemicals used in their operations.  Then I tried to use it and encountered another “fracking wall.”  As objective as I can be now and in my professional opinion based upon my experience cited in my biography below, FracFocus.org’s website at that time was intentionally constructed to make information extraction difficult.

I found a Frequently Asked Questions webpage upon which I found the following question:

“Are the records from FracFocus available in a digital format such as Excel?”

It was answered as follows:

“No. FracFocus was originally designed to serve records one at a time in Adobe pdf format in order to ensure accurate, unaltered and uncompromised data. Consequently the chemical information gathered does not currently reside in a database or spreadsheet format.”

Another question on the same page was as follows:

“Why can’t the system show me the information on more than one well at a time?”

It was was answered as follows:

“The purpose of the FracFocus records presentation system was to provide those who may live near a well that has been fractured with information concerning the materials used to fracture the well.  All information other than the information used in the search form is available only in an Adobe pdf format.  As such, information such as Ingredients, Trade Names and CAS numbers is not available for search or data aggregation purposes.”

It was this answer that surprised me the most.  Once again, I was stymied.  At the time this was only a “night job” self-education project, so I moved on to looking for well information from more easily-obtained sources.

SkyTruth.org Extracts Data from FracFocus.org

Several months later, while surfing the web, I noted that SkyTruth.org had extracted data from the FracFocus.org website.  I went to the article and attempted to obtain the data from the links they had posted on the http://frack.skytruth.org/fracking-chemical-database/frack-chemical-data-download web page.  I found it to be a little bit difficult to use for analytical reasons, so I applied some of my programming skills and pulled all of the extracted data into two data sets: one of “reports” that contained information about each well, and another of “chemicals” citing which chemicals were used in each reported well.  I then joined these two data sets into a “blended” data set and pushed all three of the resulting data sets to the “Download” section of an open-source repository at http://code.google.com/p/fracking-analysis/.

Database Preparation

As I wished to do some Structured Query Language (SQL) analysis of the extracted data and take a swipe at creating a high-speed, easily-searchable website using Apache Solr 4 as the back-end search engine, I imported the data into two tables within a MySQL database on an Amazon Web Services (AWS) Relational Database Service (RDS) instance.  I used an AWS RDS instance so that I could expand or contract its capacity and capability at will and as needed.  Having been pointed to a source of Chemical Abstract Number  information coupled with recognized and suspected toxicities, I consolidated this data into a table suitable for import into the MySQL database as well.

Once these three tables, as well as a “view” that joined the chemicals to their respective well reports, had been created in the MySQL database, I backed it up and posted it to the open-source repository at http://code.google.com/p/fracking-analysis/downloads/detail?name=frackanalysis_db_backup.zip&can=2&q=.

The tables presently contained within the database are as follows:

  • “chemicals” – a table of approximately 800,000 records citing information about each chemical documented as having been used on the reported well.
  • “reports” – a table of approximately 27,000 records citing information about each well reported as of September 21, 2012.
  • “uvw-chemicals-reports” – a view joining the each record in the “chemicals” table with its respective “reports” table.
  • “cas_chemicals” – a table of approximately 8,000 rows citing information about each chemical with its noted toxicity (e.g. cancer, respiratory, blood) and its category (e.g. recognized or suspected).

The above database and the tables within it are the basis of the queries cited below.

Simple Analytical Queries

Let’s start off with some simple analytical queries of the database.  These will be prefaced with an explanation of their purpose in the format of a natural language query.

“What is the maximum number of chemicals entered for a reported well?”

SELECT MAX(`Count`) Max_Count FROM (SELECT COUNT(*) `Count`    FROM frackanalysis.`chemicals` GROUP BY pdf_seqid) subqry1;

Which gives one an answer of 310.

“What is the minimum number of chemicals entered for a reported well?”

SELECT MIN(`Count`) Min_Count FROM (SELECT COUNT(*) `Count`    FROM frackanalysis.`chemicals` GROUP BY pdf_seqid) subqry1;

Which gives one an answer of 1.

“What is the average number of chemicals entered for a reported well?”

SELECT AVG(`Count`) Avg_Count FROM (SELECT COUNT(*) `Count`    FROM frackanalysis.`chemicals` GROUP BY pdf_seqid) subqry1;

Which gives one an answer of 29.0704.

More Complex Analytical Queries

Let’s continue with some more complex analytical queries of the database.  These will be prefaced with an explanation of their purpose in the format of a natural language query.

“What is the maximum number of chemicals entered for a reported well that are recognized carcinogens?”

SELECT MAX(`Count`) Max_Count FROM (SELECT COUNT(*) `Count` FROM frackanalysis.`chemicals` c INNER JOIN frackanalysis.`cas_chemicals` cc ON c.`cas_number` = cc.`CAS_EDF_ID` WHERE c.`cas_type` = ‘valid’ AND cc.`Toxicity` = ‘cancer’ AND cc.`Category` = ‘recognized’ GROUP BY pdf_seqid) subqry1;

Which gives one an answer of 6.

“What is the average number of chemicals entered for a reported well that are recognized carcinogens?”

SELECT AVG(`Count`) Avg_Count FROM (SELECT COUNT(*) `Count` FROM frackanalysis.`chemicals` c INNER JOIN frackanalysis.`cas_chemicals` cc ON c.`cas_number` = cc.`CAS_EDF_ID` WHERE c.`cas_type` = ‘valid’ AND cc.`Toxicity` = ‘cancer’ AND cc.`Category` = ‘recognized’ GROUP BY pdf_seqid) subqry1;

Which gives one an answer of 1.2445.

A More Interesting Analytical Query

Let’s continue with some more interesting analytical queries of the database.  These will be prefaced with an explanation of their purpose in the format of a natural language query.

“What are the most reported chemicals that are recognized carcinogens in descending order?”

SELECT * FROM (SELECT cc.`Chemical_Name`, COUNT(*) `Count` FROM   frackanalysis.`chemicals` c INNER JOIN frackanalysis.`cas_chemicals` cc ON c.`cas_number` = cc.`CAS_EDF_ID` WHERE c.`cas_type` = ‘valid’ AND cc.`Toxicity` = ‘cancer’ AND cc.`Category` = ‘recognized’ GROUP BY cc.`Chemical_Name`) subqry1 ORDER BY `Count` DESC;

Which gives one the resultset below:

NAPHTHALENE 6691

BENZYL CHLORIDE   1636

FORMALDEHYDE      1563

EPICHLOROHYDRIN   378

ATTAPULGITE 334

NITRILOTRIACETIC ACID, TRISODIUM SALT MONOHYDRATE     259

ACRYLAMIDE  233

BIS(2-CHLOROETHYL) ETHER      204

ETHYLBENZENE      134

THIOUREA    96

ACETALDEHYDE      29

1,4-DIOXANE 11

ETHYLENE OXIDE    11

ANTIMONY TRIOXIDE 5

NITRILOTRIACETIC ACID   1

propylene glycol butyl ether  1

Summary

The queries above are just a few of the many that could be made against the cited database and its tables.  A more thorough set of queries seem to be in order by professionals more versed in health studies.  Now that a database suitable for analysis is available, such may yet be possible.    As the records in the reports table contain latitude and longitude values, queries relevant to location seem to be in order as well.  A list of recommendations concerning further development of this database appears below.

Recommendations

  • A table of CAS chemicals along with their toxicity on a relative scale be found and imported into the database.  Perhaps information from the Material Safety Data Sheets (MSDS) can be used to obtain the relative toxicity as the MSDS ranks chemicals on a scale from 0 to 4.
  • A table of cities is obtained from a source like the National Atlas Data Download and the reported wells plotted in relation to said cities.
  • A table of wells drilled be imported into the database so that a query of wells drilled versus wells reported to the FracFocus.org site can be made to assess the level of reporting compliance in jurisdictions in which reporting is mandatory.
  • A table of schools is imported into the database so that proximity-oriented queries can be made against both wells and chemicals.
  • A table of earthquakes is imported into the database so that proximity-oriented and time-sequenced queries can be made against reported wells.
  • An open-source repository is created for storing and retrieval of both the data and analytical queries.
  • An open-source, high-speed search engine web site is created as an aid to the data analysis community as well as a demonstration site for presentations to policy makers as to what is possible with a relative minimum of effort.

Biography

David Darling is a U.S. Army veteran (tactical nuclear weapons) with a B.S. in Zoology and over thirty years of professional experience in Information Technology as a software developer, corporate-level database developer/administrator and search engine developer/administrator. He presently authors software in Java, Java Server Faces, PrimeFaces, coupled with the Apache open-source technologies of Velocity, Solr, and Hadoop.  Additional authorship in databases is done in SQL Server, MySQL, PostgreSQL/PostGIS, and Oracle. He especially enjoys any applications dealing with mapping, spatial data, and Geographical Information Services (GIS). He has extensive experience in petrochemical, law enforcement, local/county/regional governments, voter registration, taxation and property tax appraisal, telecommunications, real-time tracking (spatial/status), and health-care information.