Category Archives: curation

Breaking the “Fracking Wall”

This post describes why I’ve resolved to break the “fracking wall” surrounding the data sources of oil well locations, fracking chemical disclosures, and earthquake sources.

BACKGROUND

I am a software/database/systems developer/designer/analyst with over thirty (30) of IT experience in a variety of domains: petrochemical plant applications, tax appraisal, county-level governmental agencies, law enforcement applications, point-of-sale systems, data warehousing and analysis, insurance, near-realtime aircraft/vessel dispatch and tracking, mapping applications, search engines, desktop and web applications, health care extraction, transformation, loading (ETL) and analysis. In short, there’s not a lot I haven’t done over my career.

SELF-EDUCATION

One of my continuing challenges is to self-educate on emerging languages, databases, and software on a frequent basis. This I do as a “night job” a few nights each week, every week, every month of every year. Having enjoyed applications involving mapping the most, in the Spring of 2012 I decided on a course of self-education with variety of mapping packages, but covering a single domain with free information: earthquakes. I choose this domain for no other reason than the data was freely available and of modest size, sources of publicly-available data being just a few million records.

EARTHQUAKES

And so I merrily went about my self-education on various mapping packages using the free source of earthquake data, enjoying positive results and pretty graphics along the way. Then, quite to my surprise, “swarms” of earthquakes began to materialize on the various maps I was creating. Interestingly enough, some of those swarms were in Oklahoma, a state of the union in which I had the privilege of living in from 1979 through 1982. What struck me as interesting was that I didn’t recall that many, actually relatively few, earthquakes during those three years I lived in that state. Needless to say, my interest was piqued.

SWARMS EMERGE

So, I began to plot out the earthquakes for Oklahoma on a wider scale, on a year-by-year basis, and I could discern that there were “swarms” of earthquakes materializing in places where there had been very few in the preceding decades. As I have a B.S. in Zoology, a scientific bend to my mind and an absolute passion for the discernment of emerging patterns, mental alarm bells went off that I was seeing an emerging pattern that might have a more anthropogenic than natural origin. Casually, as this was a “night job,” I began searching the Internet for possible causalities and ran across the hypothesis that hydro-fracturing a.k.a. “fracking” operations, specifically the injection of water and chemicals into “fracking” and underground “disposal” wells, was causing the emerging swarms of earthquakes.

EARTHQUAKE SWARMS vs. OIL WELL LOCATIONS

At a magnitude of 5.6, the largest earthquake ever recorded in Oklahoma up to that time struck on November 5, 2011, being preceded by 4.7 through 5.0 foreshocks earlier in the day. It was this earthquake and its foreshocks that really raised my interest as I was mapping not only the locations of the earthquakes on the map but also their intensity via color, the more reddish the stronger the quake. To me, where there’s smoke there’s fire, and that being said I resolved to start mapping out well locations as well. It was my quest for well locations on a state-by-state basis that turn self-education into an avocation of sorts, and introduced me to the “fracking wall.”

HITTING THE “FRACKING WALL”

In an effort to obtain the locations of oil wells, I contacted various state agencies of the State of Oklahoma with virtually no results. It wasn’t that the data wasn’t available, it’s that what data was available was not easily downloaded and most importantly did not contain the latitudes and longitudes of the wells. In other words, I could roll the cigar between my fingers but I could neither light nor puff upon it. I was told by one state official, emphatically, that such location data was not available. Agency-by-agency, I wrote and/or called the appropriate personnel, and although most of the employees were polite, they were also equally unhelpful. It took me several months to find out where the data sets containing oil well location data had been posted. There was one, I repeat one, mention of a link to Oklahoma’s oil well location datasets in an obscure forum in a backwater of the Internet. This was the clue I needed, and finally I was able to plot the oil well locations against the occurrence of earthquakes and confirm that “where there’s smoke, there’s fire.” The refusal of the State of Oklahoma to point me to the location(s) of oil well location data was my first experience with the “fracking wall,” and it wouldn’t be my last although the “fracking wall” would be manifested by different states and agencies in different ways.

BREAKING THE “FRACKING WALL”

Because of the State of Oklahoma’s behavior and lack of cooperation, I resolved to break the “fracking wall” for both myself and all others needing access to the same type of data. In an effort to collect all of the oil well location and fracking chemical disclosure hyperlinks in one place, as well as offer curated datasets of the aforementioned data, I created the frackingdata.org website with curated FracFocus data extracts, chemical toxicities and their datasets, state-by-state sources of well location data, and the source code used to extract the datasets into more usable forms. In short, I created frackingdata.org to be a one-stop shop for anyone wishing to conduct analysis of fracking-related data.

FUTURE INITIATIVES

+ Link earthquake data to oil well locations in a manner convenient to anyone wishing to analyze such data (In progress)

+ Automate the download, extraction, transformation, and loading of FracFocus.org data into datasets more suitable for use by analysts or citizen-scientists. (Done)

+ Transform the FracFocus.org GUID keys into more user-friendly integer keys that also reduce storage by over 25% (Done)

+ Push the curated data sets to ODATA repositories, e.g. Google Fusion Tables, so that analysts can more easily access the data via packages like Tableau, SAS, or R. (In progress)

+ Push more of the source code used to do this extraction, transformation, and loading to repositories like GitHub so that all may share in its presence and perhaps even contribute to its maintenance. (Partially done)

As I have a “day job,” progress is painful but the results are worth it.

Khepry Quixote
11 March 2016

Bill of Rights for Fracking Information

  1. That all of the data and its documentation:
    1. Should be
      1. in a machine-readable form
      2. Suitable for aggregation
      3. And downloadable in a compact form (e.g. ZIP, 7z).
    2. Should be suitable for its nominal purposes of research and reporting by
        1. Reporters
        2. Data Analysts
        3. Citizen Scientists
        4. Regulators
    3. Should be released
      1. In a frequent and timely manner.
      2. With “delta” datasets available, with “delta” being differences between the current and previous releases.
        1. The “delta” datasets should contain the following machine-readable “images”:
          1. “Previous” image.
          2. “Current” image.
          3. “Changed” image with only the values that are different being reported.
    4. Should NOT reside:
      1. Behind a pay-wall.
      2. Behind a registration-wall.
    5. Should be accessible:
      1. Interactively.
      2. ReST-fully via an API.
    6. Should be curated in a manner consistent with:
      1. The norms of professional, responsible data-warehousing.
        1. For example, the elimination of extraneous TAB, LINEFEED, or DIACRITIC characters that should NOT appear within a column.
        2. The resolution of disparate geographical projections (e.g. NAD27, NAD83) into a unified geographical projection (WGS84) suitable for mapping via geographic information systems or platforms such as Google Maps (WGS84).
      2. The needs of others to reliably export the data to alternative formats (e.g. CSV, XML, JSON).