[Chicago-talk] Removing Characters
Steven Lembark
lembark at wrkhors.com
Wed Oct 24 17:03:05 PDT 2007
> You are right if it is well formated CSV file.
> But I don't know if this is guaranteed. There is no document about it. I
> just rewrite an old old uncommented/undocumented working scripts.
>
> The segment is just rewritten to make it little bit easier to read by
> replacing the embedded 'DEL' charters with a variable populate with
> function chr.
>
> Below is the best I can do. It looks better to me and can run as two
> times faster than the old one does. I still cannot make out one-liner
> for it. Can anyone get ride of the first line?
>
> my $leadingQ=""; $leadingQ='"' if /^"/; #save the leading quote if it
> is there
> s/(?<!,)"(?!(,|$))//g; # remove all double quote not next to comma or
> at the end of the line
> print OUTF $leadingQ, $_;
That's what I was going to suggest.
You might want to eyeball Text::Balanced (Conway)
and Regexp::Common::balanced (abagail). T::B allows
you to validate the content, which might be the
best approach: work your way in from the edges
and discard what's in the middle of balanced
quotes.
--
Steven Lembark 85-09 90th Street
Workhorse Computing Woodhaven, NY 11421
lembark at wrkhors.com +1 888 359 3508
More information about the Chicago-talk
mailing list