There are multiple sources of FracFocus.org data, some of them more “official” than others. To get an idea of the extract-transform-load (ETL) tasks involved in manifesting some of the files below, please refer to the ETL Task List page.
- FracFocus data sources of FracFocus.org “version 1” thru “version 3” data (most “official” to date, see the explanation below)
- FracFocus.org’s CSV data has been imported, transformed, and augmented with chemical toxicities by the author of this blog resulting in the file available at the link below:
- Comma-delimited CSV files augmented with chemical toxicities (zipped with 7-Zip)
- As the analysis of FracFocus.org’s database will be an ongoing and evolving endeavor, a GitHub repository has been created to hold any SQL or program code used to facilitate data extraction, transformation, loading, and querying.
- A Microsoft Access database in “accdb” format containing various tables as follows:
- FracFocus.org-related tables
- Earthquakes-related tables
- NCEDC_earthquakes_reverse_geocoded (worldwide, 1898 to date, magnitude 0 and up)
- Toxicities-related tables
- Views utilizing the above tables
- Link(s) to Microsoft Access database(s), compressed with the 7-Zip program:
EPA data sources of FracFocus.org “version 1” data (somewhat “official” in that FracFocus.org provided the EPA with a subset of the “version 1” data, see the explanation below)
- FracFocus.org-related tables
- SkyTruth data sources of extracted FracFocus.org “version 1” data (likely considered “least official,” but appears to be accurate to date, see the explanation below)
An example of a “less-than-official” source is SkyTruth.org‘s data, extracted from the FracFocus.org website when it was still considered “version 1”. Once FracFocus.org upgraded their website to “version 2”, the ability for SkyTruth.org or the general public to obtain data in bulk was effectively disabled via throttling. As of April 2015, it is rumored that FracFocus.org‘s soon-to-appear “version 3” will allow the download of data in bulk.
An example of a “more official” source is when FracFocus.org provided a substantial subset of their “version 1” data for analysis by the U.S. Environmental Protection Agency (EPA). It’s not as voluminous as the data extracted by SkyTruth.org, but it’s been refined somewhat to compensate for the significant lack of quality found in FracFocus.org‘s “version 1” original source data.
For the most “official” source of data, as of 7 May 2015 FracFocus.org added a web page with links to archived SQL backup files of their “version 1” through “version 3” data. This download-centric web page can be found at http://fracfocus.org/data-download. Please note that these files appear to be database backup files of
an unspecified SQL database (as of 7 May 2015) a Microsoft SQL Server 2012 database (as of 9 May 2015), and as such are not immediately available for analysis by researchers without a database background or access to database-savvy staff. That being said, FrackingData.org will endeavor to translate those SQL database backup images to files more suitable for intake by other databases and analytical tools (e.g. CSV, tab-delimited, SQLite, PostgreSQL, R). Please be patient with the translation process as the author has a “day job” in addition to his “fracking data” job.
A post will be forthcoming shortly detailing as to what program(s) are needed to import SQL Server backup (BAK) files into an SQL Server database, as well as the steps to do so. Update: As of 9 May 2015, FracFocus.org provided a PDF document detailing how to import the database backup files into SQL Server 2012 and then, using the SQL Server database as an ODBC data source, import the “RegistryUpload*” tables into a Microsoft Access database.
Update: On or about 23 June 2016, FracFocus.org appears to have remove the PDF document detailing how to import the database backup files into SQL Server 2012 and then, using the SQL Server database as an ODBC data source, import the “RegistryUpload*” tables into a Microsoft Access database. This, in this author’s humble opinion, is not “a good thing”, because it makes it even more difficult for the average analyst/citizen to restore the SQL Sever database BAK file into a SQL Server database and then subsequently import the data into a Microsoft Access database. Once again, this does not serve the interests of transparency for anyone, pro or con.
Update: Sometime in the week or so after 23 June 2016, FracFocus.org appears to have restored the aforementioned PDF document, noted in the strikeout above, detailing how to import the database backup files into SQL Server 2012 and then, using the SQL Server database as an ODBC data source, import the “RegistryUpload*” tables into a Microsoft Access database.