[Chicago-talk] Regex and the whitespace before it.

tiger peng tigerpeng2001 at yahoo.com
Wed Mar 26 11:02:23 PDT 2008


short version
-> perl -072 -pe 'if ($. == 1){s/(\S+)(\s*:\s*)$/\1,"/} else {s/\n/ /gms; s/(\S+)(\s*:\s*)$/"\n\1,"/} ' tmp.txt


----- Original Message ----
From: tiger peng <tigerpeng2001 at yahoo.com>
To: Chicago.pm chatter <chicago-talk at pm.org>
Sent: Wednesday, March 26, 2008 12:58:24 PM
Subject: Re: [Chicago-talk] Regex and the whitespace before it.

add condition for remove the bad line:

 perl -072 -ne 'if ($. == 1){s/(\S+)(\s*:\s*)$/\1,"/; print} else {s/\n/ /gms; s/(\S+)(\s*:\s*)$/"\n\1,"/; print} ' tmp.txt


----- Original Message ----
From: tiger peng <tigerpeng2001 at yahoo.com>
To: Chicago.pm chatter <chicago-talk at pm.org>
Sent: Wednesday, March 26, 2008 12:51:39 PM
Subject: Re: [Chicago-talk] Regex and the whitespace before it.


I think this is very close to what you want, except the first line of the output.

-> cat tmp.txt
Tree : This is a sentence, and a statement, about how
great trees are. Sky : This, on the other hand, is something cool
about the sky, sometimes blue, or not. AIR : AIR is sometime
breathable or not depending on where you are BlueBirds : Are cool
little birds with blue feathers.

-> perl -072 -ne 's/\n/ /gms; s/(\S+)(\s*:\s*)$/"\n\1,"/; print ' tmp.txt
"
Tree," This is a sentence, and a statement, about how great trees are. "
Sky," This, on the other hand, is something cool about the sky, sometimes blue, or not. "
AIR," AIR is sometime breathable or not depending on where you are "



----- Original Message ----
From: Mike Ferrari <mikeferrari8 at yahoo.com>
To: Chicago.pm chatter <chicago-talk at pm.org>
Sent: Wednesday, March 26, 2008 10:02:57 AM
Subject: [Chicago-talk] Regex and the whitespace before it.

Hi Everyone

I want to thank everyone for the good discussion onthis list recently, and Mike Fragassi for his help last year withXML::Parser.

I have another stumper i am dealing with and need alittle help. My regex skills are weak, but i am reading up on regexlike a madman.

I have data like this...

Tree : This is a sentence, and astatement, about how great trees are. Sky : This, on the other hand, issomething cool about the sky, sometimes blue, or not. AIR : AIR issometime breathable or not depending on where you are BlueBirds : Arecool little birds with blue feathers.

And it goes on and on.. maybe not with the cool summertime content.. but you get my drift.. essentially its ..

subject : description of a subject. subject : description of a subject. subject : description of a subject.  etc etc

I need to parse this and break it out into individual strings so i can print it to a csv file for easy spreadsheet reading.

subject : description of a subject. 
Sky : This, on the other hand, is something cool about the sky, sometimes blue, or not.

I can split the data by : but that leaves the subject word "out" ..and the next subject word "in",  .. not every description ends with aperiod either.. and commas and other junk are interspersed haphazardly in thedescription ..

How can i split the string by the whitespace one word before the :  ?

I have been playing with $` but not getting what i need.

Any ideas.

Thanks
Mike F



-- 
/dev/mike0

http://www.mikeferrari.com










-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.pm.org/pipermail/chicago-talk/attachments/20080326/c11556a7/attachment.html 


More information about the Chicago-talk mailing list