Category Archives: data

MS Access: Database posted to downloads site

Microsoft Access, while not SQL-92 compliant, is a very popular database program suitable for analytical use by many people that don’t use R, SAS, or Tableau for analysis and reporting purposes.

Concerning FracFocus.org-related data, and back again by popular demand, FrackingData.org is now providing (see link below) a Microsoft Access database in “accdb” format containing various tables as follows:

  • FracFocus.org-related tables
    • dbo_RegistryUpload
    • dbo_RegistryUploadPurpose
    • dbo_RegistryUploadIngredients
  • Earthquakes-related tables
    • NCEDC_earthquakes_reverse_geocoded (worldwide, 1898 to date, magnitude 0 and up)
  • Toxicities-related tables
    • Chemical_Toxicities_Blended_Sorted
    • Chemical_Toxicities_Blended_Grouped
    • Chemical_Toxicities_Blended_Flattened_Boolean
  • Views utilizing the above tables
    • vue_Registry_Upload_Purpose_Ingredients
    • vue_Registry_Upload_Purpose_Ingredients_Toxicities
  • Link(s) to Microsoft Access database(s), compressed with the 7-Zip program:

Henceforth, this database will be available on the same schedule as the CSV, SQLite, and PostgreSQL files and a page holding the latest link can be found on the FracFocus Data page of FrackingData.org’s site (link below):

Khepry Quixote
10 June 2016

Advertisements

Earthquakes: Reverse-geocoder published on GitHub

Making good on my previous promise, I have released the source code for the NCEDC-formatted earthquake CSV file reverse-geocoder, written in Python 3, on GitHub as both as “Gist” and as an Eclipse-PyDev project .

Each of the above links has a README file with instructions on its use, arguments, and dependencies.

I dedicate this project and Gist to those about to endure the dubious “benefits” of fracking operations in the United Kingdom.

Khepry Quixote
7 June 2016

Status Update 2016-06-01: FrackingData_FracFocusRegistry 2016-05 Files Uploaded

As of 01 June 2016, various files (e.g. SQlite, CSV, and PgSQL) derived from FracFocus.org’s April 2016 FracFocusRegistry have been downloaded, extracted, transformed, loaded, archived, and uploaded to the frackingdata.info/downloads site and their respective links also posted to FrackingData’s FracFocus Data Page .

This time, FracFocus posted their SQL Server backup on 23 May 2016, almost a month later than its previous posting of 26 April 2016.

Once again, of significance this time was that the download of the files from the FracFocus.org website and their subsequent extract, transform, load, archiving, and exporting to CSV, SQLite, and PostgreSQL files was performed by a Windows batch script without human intervention. This automated method shaved hours from the extract, transform, load, archive, and export process.  In addition, the batch script now uses WinSCP to automatically upload the files in question to the http://frackingdata.info/downloads page.

When this Windows batch file is sufficiently stable, and I’ve soft-coded the data-cleansing views into the script itself,  I’ll post a link to it in the Source Code section of this blog.  Soft-coding of the data-cleansing views is the last hurdle to publishing this script.

Khepry Quixote 2016-06-01

Earthquakes: Reverse-Geocoded Files Posted to frackingdata.info/downloads

As I promised earlier, I’ve downloaded earthquakes from NCEDC’s web site (1898 to date), reverse-geocoded them via GeoNames and K-D Trees (thereby obtaining their country, state, county, and city/village values), archived the resulting files via 7-ZIP and uploaded both the CSV and SQLite datasets to:

I have authored a program in Python 3 that reverse-geocodes (via GeoNames and K-D Trees) the lat/longs into their respective countries, states, counties, and cities/villages.  I will post a link to the open-source project shortly once I’ve vetted its license and repository.  The program processes nearly 3 million rows in approximately 240 seconds.

Status Update 2016-05-02: FrackingData_FracFocusRegistry 2016-04 Files Uploaded

As of 02 May 2016, various files (e.g. SQlite, CSV, and PgSQL) derived from FracFocus.org’s March 2016 FracFocusRegistry have been downloaded, extracted, transformed, loaded, archived, and uploaded to the frackingdata.info/downloads site and their respective links also posted to FrackingData’s FracFocus Data Page .

The substantial delay between the last posting of the transformed FracFocusRegistry download in early March and this one in May was mostly due to FracFocus NOT posting anything until 26 April 2016.  This tardiness on FracFocus’s part is becoming a pattern.

Once again, of significance this time was that the download of the files from the FracFocus.org website and their subsequent extract, transform, load, archiving, and exporting to CSV, SQLite, and PostgreSQL files was performed by a Windows batch script without human intervention. This automated method shaved hours from the extract, transform, load, archive, and export process.  In addition, the batch script now uses WinSCP to automatically upload the files in question to the http://frackingdata.info/downloads page.

When this Windows batch file is sufficiently stable, and I’ve soft-coded the data-cleansing views into the script itself,  I’ll post a link to it in the Source Code section of this blog.  Soft-coding of the data-cleansing views is the last hurdle to publishing this script.

Khepry Quixote 2016-05-02

Earthquakes – Reverse Geocoding Coming Soon

One of the most vexing sets of data to make usable for a data analyst is the earthquake dataset available via the NCEDC search site.  While the site returns results quickly enough to an anonymous FTP site, they do not contain any columns representing the country, state, county, or city.  These columns are some of the most useful for analysis of questions such as: “Oklahoma now rivals and even exceeds California for the number of significant earthquakes?”

Believe it or not, the answer to the preceding question is “True,” especially when one can analyze reverse-geocode earthquakes using relatively simple SQL queries.

The difficulty was in the reverse-geocoding of the latitudes and longitudes to their respective countries, states, counties, and cities.  Originally, I had authored a Java program that used various ESRI shape files and discerned to which administrative units a lat/long belonged.  That is, if you wanted to wait 12 hours for it to run.

Given the long run time and inconvenience of obtaining the shape files, I declined to publish it except as source code with little, if any, explanation as to its operation and use.  I just didn’t think it was suitable for public consumption yet, as the reverse-geocoding was only tediously repeatable.  I knew there was a better way, and as of a few weeks ago, after some research, I authored a better mousetrap:

  • a Python 3.4 script
  • using the reverse-geocoder package
  • which uses K-D trees
  • and datasets from GeoNames
  • reverse-geocoding 2.8 million rows in approximately 210 seconds

So, in the next few weeks, the Python script will be pushed to GitHub and the reverse-geocoded earthquake dataset to http://frackingdata.info/downloads.  A posting or postings will be pushed when this is done.

How cool, from 12 hours to 210 seconds.

Finally, some progress…

Khepry Quixote
12 April 2016

Breaking the “Fracking Wall”

This post describes why I’ve resolved to break the “fracking wall” surrounding the data sources of oil well locations, fracking chemical disclosures, and earthquake sources.

BACKGROUND

I am a software/database/systems developer/designer/analyst with over thirty (30) of IT experience in a variety of domains: petrochemical plant applications, tax appraisal, county-level governmental agencies, law enforcement applications, point-of-sale systems, data warehousing and analysis, insurance, near-realtime aircraft/vessel dispatch and tracking, mapping applications, search engines, desktop and web applications, health care extraction, transformation, loading (ETL) and analysis. In short, there’s not a lot I haven’t done over my career.

SELF-EDUCATION

One of my continuing challenges is to self-educate on emerging languages, databases, and software on a frequent basis. This I do as a “night job” a few nights each week, every week, every month of every year. Having enjoyed applications involving mapping the most, in the Spring of 2012 I decided on a course of self-education with variety of mapping packages, but covering a single domain with free information: earthquakes. I choose this domain for no other reason than the data was freely available and of modest size, sources of publicly-available data being just a few million records.

EARTHQUAKES

And so I merrily went about my self-education on various mapping packages using the free source of earthquake data, enjoying positive results and pretty graphics along the way. Then, quite to my surprise, “swarms” of earthquakes began to materialize on the various maps I was creating. Interestingly enough, some of those swarms were in Oklahoma, a state of the union in which I had the privilege of living in from 1979 through 1982. What struck me as interesting was that I didn’t recall that many, actually relatively few, earthquakes during those three years I lived in that state. Needless to say, my interest was piqued.

SWARMS EMERGE

So, I began to plot out the earthquakes for Oklahoma on a wider scale, on a year-by-year basis, and I could discern that there were “swarms” of earthquakes materializing in places where there had been very few in the preceding decades. As I have a B.S. in Zoology, a scientific bend to my mind and an absolute passion for the discernment of emerging patterns, mental alarm bells went off that I was seeing an emerging pattern that might have a more anthropogenic than natural origin. Casually, as this was a “night job,” I began searching the Internet for possible causalities and ran across the hypothesis that hydro-fracturing a.k.a. “fracking” operations, specifically the injection of water and chemicals into “fracking” and underground “disposal” wells, was causing the emerging swarms of earthquakes.

EARTHQUAKE SWARMS vs. OIL WELL LOCATIONS

At a magnitude of 5.6, the largest earthquake ever recorded in Oklahoma up to that time struck on November 5, 2011, being preceded by 4.7 through 5.0 foreshocks earlier in the day. It was this earthquake and its foreshocks that really raised my interest as I was mapping not only the locations of the earthquakes on the map but also their intensity via color, the more reddish the stronger the quake. To me, where there’s smoke there’s fire, and that being said I resolved to start mapping out well locations as well. It was my quest for well locations on a state-by-state basis that turn self-education into an avocation of sorts, and introduced me to the “fracking wall.”

HITTING THE “FRACKING WALL”

In an effort to obtain the locations of oil wells, I contacted various state agencies of the State of Oklahoma with virtually no results. It wasn’t that the data wasn’t available, it’s that what data was available was not easily downloaded and most importantly did not contain the latitudes and longitudes of the wells. In other words, I could roll the cigar between my fingers but I could neither light nor puff upon it. I was told by one state official, emphatically, that such location data was not available. Agency-by-agency, I wrote and/or called the appropriate personnel, and although most of the employees were polite, they were also equally unhelpful. It took me several months to find out where the data sets containing oil well location data had been posted. There was one, I repeat one, mention of a link to Oklahoma’s oil well location datasets in an obscure forum in a backwater of the Internet. This was the clue I needed, and finally I was able to plot the oil well locations against the occurrence of earthquakes and confirm that “where there’s smoke, there’s fire.” The refusal of the State of Oklahoma to point me to the location(s) of oil well location data was my first experience with the “fracking wall,” and it wouldn’t be my last although the “fracking wall” would be manifested by different states and agencies in different ways.

BREAKING THE “FRACKING WALL”

Because of the State of Oklahoma’s behavior and lack of cooperation, I resolved to break the “fracking wall” for both myself and all others needing access to the same type of data. In an effort to collect all of the oil well location and fracking chemical disclosure hyperlinks in one place, as well as offer curated datasets of the aforementioned data, I created the frackingdata.org website with curated FracFocus data extracts, chemical toxicities and their datasets, state-by-state sources of well location data, and the source code used to extract the datasets into more usable forms. In short, I created frackingdata.org to be a one-stop shop for anyone wishing to conduct analysis of fracking-related data.

FUTURE INITIATIVES

+ Link earthquake data to oil well locations in a manner convenient to anyone wishing to analyze such data (In progress)

+ Automate the download, extraction, transformation, and loading of FracFocus.org data into datasets more suitable for use by analysts or citizen-scientists. (Done)

+ Transform the FracFocus.org GUID keys into more user-friendly integer keys that also reduce storage by over 25% (Done)

+ Push the curated data sets to ODATA repositories, e.g. Google Fusion Tables, so that analysts can more easily access the data via packages like Tableau, SAS, or R. (In progress)

+ Push more of the source code used to do this extraction, transformation, and loading to repositories like GitHub so that all may share in its presence and perhaps even contribute to its maintenance. (Partially done)

As I have a “day job,” progress is painful but the results are worth it.

Khepry Quixote
11 March 2016