[Chicago-talk] Removing Characters
Steven Lembark
lembark at wrkhors.com
Thu Oct 25 09:27:58 PDT 2007
> And this one will keep the double quote escape by backslash \". But if a
> backslash is escaped by a backslash, the double quote following the
> escaped backslash, this double quote is not escaped, will still be kept.
>
> Phew ;-&
>
> 's/(?<![,\\])(?:(^")|")(?![,;]|$)/$1?$1:""/ge?print:print'
That's what you get when you use data with
embedded delimters. It's also why Tab-separated
format has persisted for so many years: it's
fairly easy to avoid literal tabs in the data.
CSV is a mess due to (a) the lack of any real
standard and (b) including both commas and
quotes in the data.
One suggestion:
Replace delimiting quotes with a non-data separator
(e.g., ascii FS, "\t"), then do the same for any
remaining commas. At this point you can strip the
quotes as necessary (or leave them) and not care:
the delimieters do not appear in the data and you
are safe in how you process it.
enjoi
--
Steven Lembark 85-09 90th Street
Workhorse Computing Woodhaven, NY 11421
lembark at wrkhors.com +1 888 359 3508
More information about the Chicago-talk
mailing list