[Chicago-talk] Removing Characters

tiger peng tigerpeng2001 at yahoo.com
Wed Oct 24 12:54:38 PDT 2007


You are right if it is well formated CSV file.
But I don't know if this is guaranteed. There is no document about it. I just rewrite an old old uncommented/undocumented working scripts.

The segment is just rewritten to make it little bit easier to read by replacing the embedded 'DEL' charters with a variable populate with function chr.

Below is the best I can do. It looks better to me and can run as two times faster than the old one does. I still cannot make out one-liner for it. Can anyone get ride of the first line?

  my $leadingQ=""; $leadingQ='"' if /^"/; #save the leading quote if it is there
  s/(?<!,)"(?!(,|$))//g; # remove all double quote not next to comma or at the end of the line
  print OUTF $leadingQ, $_;


----- Original Message ----
From: Steven Lembark <lembark at wrkhors.com>
To: Chicago.pm chatter <chicago-talk at pm.org>
Sent: Wednesday, October 24, 2007 2:14:48 PM
Subject: Re: [Chicago-talk] Removing Characters


> There must be better way for removing the double quote in a CSV file
> optionally quoted by double quote.
> What I did as below is ugly and not reliable. Could anyone provide
 one
> beautify line?
>
>   $delimiter=chr(0227);
>   s/^"/$delimiter/g;
>   s/,"/,$delimiter/g;
>   s/"$/$delimiter/g;
>   s/",/$delimiter,/g;
>   s/"//g;
>   s/$delimiter/"/g;

You don't seem to want all of the quotes removed,
only the embedded ones. If the data is well-formatted
then the operation above will leave you with a bunch
of naked backslashes in the text:

  "this is a \"double quoted\" text line"

becomes

  "this is a \double quoted\ text line"

and you probably don't want the \d or \ in your
result.

If the real problem is that fate has handed you some
CSV data with embedded, un-escaped quotes then your
approach makes the most sense, but you'll have to
remove escaped quotes also:

  s{ \\" }{}gx;

will strip the \" char's. You might prefer to replace
them with non-delimiting quotes, e.g.,


  s{ \\" }{'}gx;

All of the CSV parsing modules assume "clean" CSV
source (oxymoron?) so if you need to clean up botched
data then some iterative approach is likely to be
what you need.

enjoi

-- 
Steven Lembark                                         85-09 90th
 Street
Workhorse Computing                                  Woodhaven, NY
 11421
lembark at wrkhors.com                                      +1 888 359
 3508
_______________________________________________
Chicago-talk mailing list
Chicago-talk at pm.org
http://mail.pm.org/mailman/listinfo/chicago-talk



-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.pm.org/pipermail/chicago-talk/attachments/20071024/d544868d/attachment.html 


More information about the Chicago-talk mailing list