Recently, one of the commentators on my “About” page asked as to “who/what are you?” in order to assess my “credibility.” As a quick response, I will posit that unlike other more politicized sites, I really don’t have a reason to bend the truth or distort the facts because after over thirty years of professional experience I’ve found that letting the data do the talking has been the most effective way of changing minds. The important part is that the data must be easily accessible and reliably accurate, hence why any “data cleansing” done by my programs or processes has been consistent with normal extraction, transform, and loading (ETL) conventions. Also, it’s the reason that I’ve already open-sourced my code for others to review and/or utilize in their own ways.
The preceding being said and if my “credibility” might still be of question, perhaps more detail might be beneficial in your evaluation, to wit:
- I’ve a B.S. degree in Zoology with a background in organic chemistry, augmented with an additional 50+ semester hours in both Data Processing and Computer Science.
- I’m an honorably discharged U.S. Army Field Artillery officer with tactical nuclear weapons and multiple-launch rocketry background.
- I’m a software and database developer and designer with over thirty (30) years of experience, as well as an IT manager or project lead for approximately twelve (12) of those years.
- My domains of knowledge are as follows:
- Petrochemical plant operations and software author, developer, and implementer: 8.5 years
- Tax appraisal operations and software developer, MIS manager: 1 year
- County-level operations and software, MIS manager: 7.5 years
- Law enforcement operations and software: 9 years
- Commercial software product design, development, and deployment: 2 years
- Cellular provider operations and software, as well as corporate-level database administrator (DBA), data warehouse and decision support system (DSS) author, developer, and implementer: 2 years
- Insurance software author, developer, and implementer, as well as Project Lead: 3 years
- Helicopter dispatch/tracking/mapping author, developer, and implementer, as well as Project Lead: 3 years
- State-level Prison Corruption Detection system author, developer, and implementer: 2 years
- Health Information DBA and software/hardware author, developer, and implementer: 1.5 years
- Healthcare All-Payers Claims Database (APCD) software author, developer, and implementer: 1.5 years
- Encryption software developer and implementer (GnuPG and Gpg4Win): 1.5 years
- Fracking data analyst (FracFocus, SkyTruth, chemical toxicities, earthquakes): 5 years
- I am proficient in the following software languages:
- Python: 2 years
- Java: 7 years
- C#: 5 years
- Visual Basic: 7 years
- I am proficient using the following databases:
- SQL Server: 10 years
- PostgreSQL: 2 years
- MySQL: 3 years
- SQLite: 2 years
- MS Access: 5 years
How I Ended Up Doing Fracking Data Analysis
Simply put, companies hardly pay for employee training anymore, in my case only twice was I schooled in my IT profession and eighteen (18) years apart at that. In between, in order to stay in step with my profession, I had to self-educate a few times a week, week after week, year after year. In order to maintain my enthusiasm for doing so, I chose topics that interested me, e.g. mapping, earthquakes, chemicals, tactical warfare systems, encryption, search engines.
During one of my “self-education” efforts attempting to “blend” the mapping of earthquakes with the locations of oil wells and eventually the chemicals that were injected into them that I encountered what I now call the “fracking wall.” I started to notice that government officials would not answer my questions, return my calls, or respond to my e-mails concerning data sources, especially once they discerned that I was attempting to “blend” data relevant to fracking. Effectively, I was met with a wall of silence.
I don’t respond well to stone-walling by any party and have a knee-jerk response to being told either “no” for no good reason or that the data doesn’t exist, when I know full well that it does…somewhere…because other parties are obviously receiving it. So, for a yet more detailed explanation as to how this website came into existence, please continue on to the following section.
A Brief “Blow-by-Blow” of How FrackingData.org Came Into Being
Back in 2011, as part of my self-education in software and database development, I chose the mapping of earthquakes because it would involve:
- Obtaining data interesting to me and I really like mapping functionality;
- Extracting, transforming, and loading that data into a variety of databases;
- Querying the data and outputting datasets suitable for use by Google Maps;
- Displaying the results via Google Maps using the programming language of my choosing.
Why I chose earthquakes is that, at that time, Oklahoma seemed to be experiencing significant “swarms” of recent earthquakes and because of that my interest had been piqued. I had read that earthquake “swarms” were sometimes co-located with fracking well or underground injection control well operations, so I figured once I had the earthquake maps figured out, I’d then try plotting oil/injection well locations as an additional exercise just for the fun of it.
Simply put, I had NO idea how frustrating obtaining the necessary data to do so would become!
Ponder the following, with “frustrations” highlighted in italics:
- I was able to obtain historical earthquake data for the entire world from 1898 to the present with relative ease. But, as that data did not include the country, state, county, or nearest city values for each earthquake in question, I had to figure out how to reverse-geocode the latitude and longitude values to obtain them. This task alone ate up weeks of time to research how do it at no cost to the developer.
- Much to my pleasure, presentation of the earthquakes via Google Maps using Java and Google Earth using KML went without a hitch. Emboldened by my success, I decided that I would plot oil well and underground injection control well locations on the map to see if the earthquakes “swarms” were co-located as some analysts had alleged. But, as I scoured the Internet for oil well locations, I hit what I now call the “fracking wall.” States such as Oklahoma obtained a significant amount of employment and revenue from fracking operations, in Oklahoma’s case approximately one-sixth, and therefore have virtually no interest in risking such activity by disclosing oil well data in a convenient manner.
- State after state, I found “inconvenient” or non-existent interfaces, concerning oil well information that contained location data (i.e. latitudes and longitudes). Frustrated by the process of locating such information and vowing that no others should have to suffer through the pain of finding it in the future, I compiled a comprehensive list of state-level oil well information hyperlinks. As far as I know, this was the first comprehensive publicly-available list of such information.
- Some states, cycling back to Oklahoma once again, denied the existence of oil well information files containing latitudes and longitudes. I was incredulous and expressed my doubt to the officials in question that such was highly unlikely. After months, yes months, of searching for Oklahoma’s oil well location data, I found an obscure hyperlink on an equally obscure forum that pointed to a “deep web” location not indexed by any search engine (e.g. Google, Yahoo). At this point, once I had this data, I was able to plot both earthquakes and well locations, and lo-and-behold there did appear to be a correlation between oil well operations and induced earthquakes. Years later, after more significant earthquakes occurred in Oklahoma, the previously “hidden” oil well location web page appeared on the Corporation Commission’s web site for all to see and utilize. For this, I take credit as the State of Oklahoma likely saw no point in hiding from the public anymore as it was already available through FrackingData.org.
- Fracking well chemical well disclosures were another matter altogether. So that other analysts could reproduce what I was doing and angered by the “fracking wall” I was encountering, I vowed to channel my efforts into rendering what data I could find about fracking well chemical disclosures, chemical toxicities, and earthquakes into various open-source databases. However, when I sought data on fracking well chemical disclosures through FracFocus.org, I ran into another “fracking wall.” Long story short, the fracking industry in a nominal effort to be “transparent,” had decided to fund a voluntary fracking well chemical disclosure that included no mechanism for the general public to download its data in a machine-readable bulk format (e.g. CSV, TSV, XML, JSON, or Access files). This resulted in another several months of delay until the activist website SkyTruth.org scraped FracFocus.org’s data and published it for all to see. For a while, this resource remained available, but FracFocus.org’s 2.0 version of its website purposely crippled this capability, and once again there was no data to be had.
- At or about the time that SkyTruth had scraped FracFocus.org’s data, Kate Konschnik, the Policy Director of Harvard Law School’s Environmental Law and Policy Program obtained fracking well disclosure data derived from FracFocus.org’s web site and published a scathing study of its flaws and shortcomings. Basically, all hell broke loose and there were cries for the data to be made machine-readable and bulk downloadable. Eventually, under pressure from the Bureau of Land Management, such was done in exchange for those companies fracking on the BLM’s land using FracFocus.org as their means of public disclosure.
- I thought the fight was over and angels would descend from upon high telling me that my analytical efforts would only get easier once FracFocus.org released its machine-readable bulk-downloadable file for public consumption. Then, I saw the format of the machine-readable bulk-downloadable file (an SQL Server database backup file) and realized that the only folks capable of using it would be software and database developers such as me. Initially, it was posted to FracFocus.org’s sites with NO indication as to its format. Experienced developers such as myself could infer that it was an SQL Server database backup file, but others would be left out in the cold. In addition, initially, there was no guidance as to how to import the file into another more commonly-used database such as Microsoft Access. Seeing this, I pitched an absolute fit to Kate Konschnik, who was apparently conversing with FracFocus.org on a more-friendly basis than I, and within hours FracFocus.org identified both the nature of the downloadable file (SQL Server 2012 database backup) but also posted a PDF document as to how to import the data into Microsoft Access.
- Once again, I thought the fight was over and angels would finally descent from upon high telling me that my analytical efforts would only get easier now that the downloadable file was available and importable. Then, I saw the nature of the data contained within FracFocus.org’s database tables and was stunned once again to find that it was riddled with various characters unsuitable for exportation to formats such as comma-separated-value (CSV) or tab-separated-value (TSV) files. Column after column, row after row contained the unsuitable characters, leaving me to conclude once again that the only folks that would be able to use the database for analysis or exportation were software developers such as me. At that point, I resolved that I would automate the restoration, extraction, cleansing, transformation, loading, and exporting of the downloadable file’s data in such a way that others could more conveniently utilize the data for analysis.
- Initially, I exported CSV files, which I then imported into SQLite database files, making both readily available on a monthly basis on FrackingData.org. Then, in an effort to jump-start other analysts, I started adding “stock” views and queries to the SQLite database file as both building blocks and examples of how to handle the data. It was at this point that I realized that I’m but one person and the files, databases, views, queries, and source code needs to be open-data, open-source, and have a lifespan beyond that of mine.
So, by now you should have good idea of why FrackingData.org came into existence and why its existence needs to continue, albeit in a more robust form. The tug-of-war between the fracking industry and the various parties seeking transparency into its activities and data is likely to continue for many years, if not decades, into the future. At this time, as far as I know, it’s the only site providing comprehensive open-data and open-source relevant to fracking and its data, and I suspect it’ll have to do so for many more moons to come.