[Chicago-talk] Regex and the whitespace before it.

imran javaid imranjj at gmail.com
Wed Mar 26 08:41:30 PDT 2008


Assuming your subjects are unique, you can do this:

my $data = "Tree : This is a sentence, and a statement, about how
great trees are. Sky : This, on the other hand, is something cool
about the sky, sometimes blue, or not. AIR : AIR is sometime
breathable or not depending on where you are BlueBirds : Are cool
little birds with blue feathers.";

my %hash = ();
%hash = ($data =~ m/(\w+)\s:\s(.*?)(?=(?:\w+\s:\s)|$)/g);

-imran

On 3/26/08, Mike Ferrari <mikeferrari8 at yahoo.com> wrote:
> Hi Everyone
>
> I want to thank everyone for the good discussion on this list recently, and
> Mike Fragassi for his help last year with XML::Parser.
>
> I have another stumper i am dealing with and need a little help. My regex
> skills are weak, but i am reading up on regex like a madman.
>
> I have data like this...
>
> Tree : This is a sentence, and a statement, about how great trees are. Sky :
> This, on the other hand, is something cool about the sky, sometimes blue, or
> not. AIR : AIR is sometime breathable or not depending on where you are
> BlueBirds : Are cool little birds with blue feathers.
>
> And it goes on and on.. maybe not with the cool summertime content.. but you
> get my drift.. essentially its ..
>
> subject : description of a subject. subject : description of a subject.
> subject : description of a subject.  etc etc
>
> I need to parse this and break it out into individual strings so i can print
> it to a csv file for easy spreadsheet reading.
>
> subject : description of a subject.
> Sky : This, on the other hand, is something cool about the sky, sometimes
> blue, or not.
>
> I can split the data by : but that leaves the subject word "out" .. and the
> next subject word "in",  .. not every description ends with a period
> either.. and commas and other junk are interspersed haphazardly in the
> description ..
>
> How can i split the string by the whitespace one word before the :  ?
>
> I have been playing with $` but not getting what i need.
>
> Any ideas.
>
> Thanks
> Mike F
>
>
>
> --
> /dev/mike0
>
> http://www.mikeferrari.com
> _______________________________________________
> Chicago-talk mailing list
> Chicago-talk at pm.org
> http://mail.pm.org/mailman/listinfo/chicago-talk
>


More information about the Chicago-talk mailing list