[Chicago-talk] Regex and the whitespace before it.

Mike Ferrari mikeferrari8 at yahoo.com
Wed Mar 26 08:02:57 PDT 2008


Hi Everyone

I want to thank everyone for the good discussion on this list recently, and
Mike Fragassi for his help last year with XML::Parser.

I have another stumper i am dealing with and need a little help. My regex
skills are weak, but i am reading up on regex like a madman.

I have data like this...

Tree : This is a sentence, and a statement, about how great trees are. Sky :
This, on the other hand, is something cool about the sky, sometimes blue, or
not. AIR : AIR is sometime breathable or not depending on where you are
BlueBirds : Are cool little birds with blue feathers.

And it goes on and on.. maybe not with the cool summertime content.. but you
get my drift.. essentially its ..

subject : description of a subject. subject : description of a subject.
subject : description of a subject.  etc etc

I need to parse this and break it out into individual strings so i can print
it to a csv file for easy spreadsheet reading.

subject : description of a subject.
Sky : This, on the other hand, is something cool about the sky, sometimes
blue, or not.

I can split the data by : but that leaves the subject word "out" .. and the
next subject word "in",  .. not every description ends with a period
either.. and commas and other junk are interspersed haphazardly in the
description ..

How can i split the string by the whitespace one word before the :  ?

I have been playing with $` but not getting what i need.

Any ideas.

Thanks
Mike F



-- 
/dev/mike0

http://www.mikeferrari.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.pm.org/pipermail/chicago-talk/attachments/20080326/9969bf71/attachment.html 


More information about the Chicago-talk mailing list