[Chicago-talk] Simple csv text parsing question

Mike Ferrari mikeferrari8 at yahoo.com
Thu Apr 10 11:51:15 PDT 2008


Hi
So i played with Text::CSV, Text::XS and Text::xSV last night and got
thoroughly lost :-). each one having its different confusing points.. each
one saying that they support embedded newlines, each without an explanation
that i understand. If anyone can point me to an example i would be in your
debt. I have been beating my head against the wall for 30 or more hours on
this problem..and its making me question basic simple little things in
perl... which isn't good.

Nothing like a simple problem to humble you :-)

I started thinking about it this morning, essentially what i need is .. when
the file is read in i need it to not be broken up by the interpretation of
the newline characters.

I read up on the $/ "slurp" variable.. which seems to do what i need it to
do.. not interpret the newlines when taking the data in

I am having a tough time coming up with a regex (i seriously need to sit
down for a week and work through/understand perl regexes) that will match
all occurrences of characters with an embedded newline surrounded by
quotes...

essentially like i have in the following (FIND OPEN QUOTE) could be
something like /"(.*)  match a quote, followed by any character except
newline, multiple times ... then a \n  and then ?????  find any character
except newline, multiple times and then ending with a quote.
This wont take into account things that are not wrapped in quotes or things
in quotes with multiple newlines.. which makes me want to lean towards using
a module and not reinventing the wheel. I am going to sit down and reread
the perldocs for Text::xSV.

#!/usr/bin/perl

local $/;
$words=<DATA>;

$words=~s/ (FIND OPEN QUOTE) \n (FIND CLOSE QUOTE) / $1 $2 /   ;
print "$words";

__DATA__
qwert,"yuiop","asdf12 34jkl","zxcvbnm"
"mnbv","vcxzlk",jhgfdsa,"poiuy"
"poiuyt","trewq",kjhtfdrseaw,kikujy 78886 htgrfed

Thanks
Mike


On Wed, Apr 9, 2008 at 3:47 PM, Mike Ferrari <mikeferrari8 at yahoo.com> wrote:

> I was playing with Text::CSV_XS .. and got lost.. seriously.. although it
> did claim it could handle "embedded newlines" which i now know is the term
> for the thing thats killing me..  i couldn't get it to handle embedded
> newlines even in binary mode.
> I will play with Text::xSV tonight to see what i come up with..
>
> With all the different "csv" parsing modules out there.. you would think
> there would be a push for one big/good one to use with good docs and
> examples... Instead of 10 different ones you load on special cases with
> mediocre docs and examples..
>
>
> > On Wed, Apr 9, 2008 at 3:37 PM, Andrew Rodland <arodland at comcast.net>
> > wrote:
> >
> > > On Wednesday 09 April 2008 01:44:46 pm Mike Ferrari wrote:
> > > > It just so happens that some of the data has a CRLF smack in the
> > > middle
> > > >
> > > > I am using Text::CSV for the first time and am liking the power to
> > > deal
> > > > with the commas and quotes and stuff.. but i am having problems with
> > > the
> > > > CRLF in the middle of the data.
> > > >
> > > > The perldoc page says if you run into non ASCII characters use
> > > binary
> > > > mode.. but the CRLF isn't non ASCII.
> > > >
> > > > Whats the most normal/easiest way to deal with CRLF in the data
> > > without
> > > > screwing up the normal end of line? I need to sanitize the data
> > > somehow and
> > > > am drawing a blank.
> > > >
> > >
> > >
> > > Try Text::xSV, it's less blatantly line-based than Text::CSV, and it
> > > copes
> > > with that sort of thing better. Besides, I think the interface is more
> > > sane.
> > >
> > > Andrew
> > > _______________________________________________
> > > Chicago-talk mailing list
> > > Chicago-talk at pm.org
> > > http://mail.pm.org/mailman/listinfo/chicago-talk
> > >
> >
> >
> >
> > --
> > /dev/mike0
> >
> > http://www.mikeferrari.com
> >
>
>
>
> --
> /dev/mike0
>
> http://www.mikeferrari.com
>



-- 
/dev/mike0

http://www.mikeferrari.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.pm.org/pipermail/chicago-talk/attachments/20080410/c99476f0/attachment.html 


More information about the Chicago-talk mailing list