Initiative: FracFocusCSV data blending with chemical toxicities

The new FracFocusCSV files, found within FracFocus.org’s FracFocusCSV.zip archive, lend themselves to blending with chemical toxicities using the CASNumber column of each row with relative ease.  In the near future, I’ll be releasing the combined FracFocusCSV.tsv file with chemical toxicities blended within.  As I already possess the chemical toxicity data, it’ll come down to modifying the PyZip2Src2Tgt program to accommodate blending the FracFocusCSV chemical declaration data with the chemical toxicity data based upon the “CASNumber” key column.

Khepry Quixote
2017-07-28

Advertisements

Status Update 2017-07-27: FracFocusCSV 2017-07 Files Combined and Uploaded

Using the PyZip2Src2Tgt Python 3 program, the 12 FracFocus CSV files contained within FracFocus.org’s FracFocusCSV.zip archive were combined into one CSV file and then compressed into one tab-delimited value file (.tsv) file, to wit:

https://s3.amazonaws.com/frackinganalysis-s3-01/downloads/FracFocusCSV_Combined_201707.7z

Please note that this compressed archive does not contain any chemical toxicities at the present time.  I’ll push a new compressed archive with those in it as well at a date in the near future.

Presently, I’m working on an SQLite database with the new FracFocusCSV file as its source of fracking chemical disclosure information.  It, too, will be available for download in the near future.

Khepry Quixote
2017-07-27

FracFocusCSV_Combined_201707.7z file pushed to S3 bucket

Using the PyZip2Src2Tgt Python 3 program, the 12 FracFocus CSV files contained within the FracFocusCSV.zip archive were combined into one CSV file and then compressed into one tab-delimited value file, i.e. a TSV file, to wit:

https://s3.amazonaws.com/frackinganalysis-s3-01/downloads/FracFocusCSV_Combined_201707.7z

Please note that this compressed archive does not contain any chemical toxicities at the present time.  I’ll push a new compressed archive with those in it as well at a date in the near future.

Presently, I’m working on an SQLite database with the new FracFocusCSV file as its source of fracking chemical disclosure information.  It, too, will be available for download in the near future.

Khepry Quixote
2017-07-26

FracFocus.org’s New CSV Files: Tweaked program that combines multiple CSVs into one CSV

Tweaked the PyZip2Src2Tgt project to handle common character transformations, for example:

# character transformation tuples list used for
# transforming characters from one character to another
# as some analytical tools are unable to handle mixed
# characters, e.g. Unicode and ASCII, during importation

char_xform_tuples_list = []

char_xform_tuples_list.append((‘\r\n’, ‘ ‘)) # carriage-return, line-feed to single space
char_xform_tuples_list.append((‘\n’, ‘ ‘)) # line-feed to single space
char_xform_tuples_list.append((‘\t’, ‘ ‘)) # tab to single space
char_xform_tuples_list.append((u’\x91’, “‘”‘)) # diacritic left single quote to single quote
char_xform_tuples_list.append((u’\x92’, “‘” )) # diacritic right single quote to single quote
char_xform_tuples_list.append((u’\x93’, ‘”‘)) # diacritic left double quote to double quote
char_xform_tuples_list.append((u’\x94’, ‘”‘)) # diacritic right double quote to double quote
char_xform_tuples_list.append((u’\xa0’, ‘ ‘)) # non-breaking space to single space

# make sure this character transformation is always the last one added!
char_xform_tuples_list.append((‘ ‘, ‘ ‘)) # double-space to single space

FracFocus.org’s New CSV Files: Authored program to combine the various CSVs into one CSV

I have authored a Python 3.6 program that extracts the various CSV files from within FracFocus.org’s new CSV compressed archive and then combines them into one CSV file in the order, based on date-time, that they were generated by FracFocus.

The GitHub project for this can be found at: https://github.com/Frackalyzer/PyZip2Src2Tgt.

I will be pushing the resulting combined CSV file up to the FrackingData.org site shortly, but it’ll likely be tomorrow before that happens.

Khepry Quixote
2017-07-24

FracFocusRegistry CSV Files Naming Commentary

I have just noticed that FracFocus is now outputting the FracFocusRegistry data in the more import-friendly CSV format.  That being said, it would be nice if FracFocus named the files with a leading zero where applicable as needed, for example:

  • FracFocusRegistry_1.csv would be better named as FracFocusRegistry_01.csv

This would allow any file-combining programs to combine the files in the order in which they were generated, as right now the file reading order is as follows:

  1. FracFocusRegistry_1.csv
  2. FracFocusRegistry_10.csv
  3. FracFocusRegistry_11.csv
  4. FracFocusRegistry_12.csv
  5. FracFocusRegistry_2.csv
  6. FracFocusRegistry_3.csv

and so on.

Changing  any file having a single digit to that with a leading zero would change the file input order to a more professional standard, e.g. _1.csv to a _01.csv

Please note that I’m working on a program now that will combine all of these CSVs into one file, and in the order that they were likely generated.  In addition, there will be some data cleansing undertaken as some rows have returns and other strange characters within their columns.

Khepry Quixote
2017-07-16