FracFocus.org’s New CSV Files: Tweaked program that combines multiple CSVs into one CSV

Tweaked the PyZip2Src2Tgt project to handle common character transformations, for example:

# character transformation tuples list used for
# transforming characters from one character to another
# as some analytical tools are unable to handle mixed
# characters, e.g. Unicode and ASCII, during importation

char_xform_tuples_list = []

char_xform_tuples_list.append((‘\r\n’, ‘ ‘)) # carriage-return, line-feed to single space
char_xform_tuples_list.append((‘\n’, ‘ ‘)) # line-feed to single space
char_xform_tuples_list.append((‘\t’, ‘ ‘)) # tab to single space
char_xform_tuples_list.append((u’\x91’, “‘”‘)) # diacritic left single quote to single quote
char_xform_tuples_list.append((u’\x92’, “‘” )) # diacritic right single quote to single quote
char_xform_tuples_list.append((u’\x93’, ‘”‘)) # diacritic left double quote to double quote
char_xform_tuples_list.append((u’\x94’, ‘”‘)) # diacritic right double quote to double quote
char_xform_tuples_list.append((u’\xa0’, ‘ ‘)) # non-breaking space to single space

# make sure this character transformation is always the last one added!
char_xform_tuples_list.append((‘ ‘, ‘ ‘)) # double-space to single space

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s