[Chicago-talk] Removing Characters

Steven Lembark lembark at wrkhors.com
Wed Oct 24 17:03:05 PDT 2007


> You are right if it is well formated CSV file.
> But I don't know if this is guaranteed. There is no document about it. I
> just rewrite an old old uncommented/undocumented working scripts.
>
> The segment is just rewritten to make it little bit easier to read by
> replacing the embedded 'DEL' charters with a variable populate with
> function chr.
>
> Below is the best I can do. It looks better to me and can run as two
> times faster than the old one does. I still cannot make out one-liner
> for it. Can anyone get ride of the first line?
>
>   my $leadingQ=""; $leadingQ='"' if /^"/; #save the leading quote if it
> is there

>   s/(?<!,)"(?!(,|$))//g; # remove all double quote not next to comma or
> at the end of the line
>   print OUTF $leadingQ, $_;

That's what I was going to suggest.

You might want to eyeball Text::Balanced (Conway)
and Regexp::Common::balanced (abagail). T::B allows
you to validate the content, which might be the
best approach: work your way in from the edges
and discard what's in the middle of balanced
quotes.


-- 
Steven Lembark                                         85-09 90th Street
Workhorse Computing                                  Woodhaven, NY 11421
lembark at wrkhors.com                                      +1 888 359 3508


More information about the Chicago-talk mailing list