[Chicago-talk] Removing Characters

Steven Lembark lembark at wrkhors.com
Wed Oct 24 12:14:48 PDT 2007


> There must be better way for removing the double quote in a CSV file
> optionally quoted by double quote.
> What I did as below is ugly and not reliable. Could anyone provide one
> beautify line?
>
>   $delimiter=chr(0227);
>   s/^"/$delimiter/g;
>   s/,"/,$delimiter/g;
>   s/"$/$delimiter/g;
>   s/",/$delimiter,/g;
>   s/"//g;
>   s/$delimiter/"/g;

You don't seem to want all of the quotes removed,
only the embedded ones. If the data is well-formatted
then the operation above will leave you with a bunch
of naked backslashes in the text:

  "this is a \"double quoted\" text line"

becomes

  "this is a \double quoted\ text line"

and you probably don't want the \d or \ in your
result.

If the real problem is that fate has handed you some
CSV data with embedded, un-escaped quotes then your
approach makes the most sense, but you'll have to
remove escaped quotes also:

  s{ \\" }{}gx;

will strip the \" char's. You might prefer to replace
them with non-delimiting quotes, e.g.,


  s{ \\" }{'}gx;

All of the CSV parsing modules assume "clean" CSV
source (oxymoron?) so if you need to clean up botched
data then some iterative approach is likely to be
what you need.

enjoi

-- 
Steven Lembark                                         85-09 90th Street
Workhorse Computing                                  Woodhaven, NY 11421
lembark at wrkhors.com                                      +1 888 359 3508


More information about the Chicago-talk mailing list