[Chicago-talk] Removing Characters

Steven Lembark lembark at wrkhors.com
Thu Oct 25 09:27:58 PDT 2007


> And this one will keep the double quote escape by backslash \". But if a
> backslash is escaped by a backslash, the double quote following the
> escaped backslash, this double quote is not escaped, will still be  kept.
>
> Phew  ;-&
>
> 's/(?<![,\\])(?:(^")|")(?![,;]|$)/$1?$1:""/ge?print:print'

That's what you get when you use data with
embedded delimters. It's also why Tab-separated
format has persisted for so many years: it's
fairly easy to avoid literal tabs in the data.
CSV is a mess due to (a) the lack of any real
standard and (b) including both commas and
quotes in the data.

One suggestion:

Replace delimiting quotes with a non-data separator
(e.g., ascii FS, "\t"), then do the same for any
remaining commas. At this point you can strip the
quotes as necessary (or leave them) and not care:
the delimieters do not appear in the data and you
are safe in how you process it.

enjoi

-- 
Steven Lembark                                         85-09 90th Street
Workhorse Computing                                  Woodhaven, NY 11421
lembark at wrkhors.com                                      +1 888 359 3508


More information about the Chicago-talk mailing list