SPUG: RE / Split Question

Wed Jul 30 20:44:45 CDT 2003

On Wed, Jul 30, 2003 at 04:54:46PM -0700, Orr, Chuck  (NOC) wrote:
> Hello All,
>  
> Please help with the following dilemma:
> 
>      I am being given a glob of data from a web page that I need to fix
> with perl.  It comes in as $blob looking like this:
>  
> 425 501 sttlwa01t 425 712 sttlwa01t tacwa02t 425 337 tacwa02t ...
>  
> I need to break this up so the word characters associated with the
> numbers stay with their numbers.  Ideally, I would have an array like
> this:
>  
> 425 501 sttlwa01t
> 425 712 sttlwa01t tacwa02t
> 425 337 tacwa02t
>  
> As you can see, I am not assured of the number of words that will follow
> each set of numbers.  Could you please suggest a split or some other
> tool that will turn the glob into the fix?
> $new_array = [ split /(?=[A-Z]\s\d)/,$scalar ];  
> 
> Which is as close as we got, does not work.  It keeps the split
> characters, but in a funky way that I cannot deal with.  It also will
> always miss the last chunk of the glob.

This is what I would use:

[breser at titanium breser]$ perl
$blob = '425 501 sttlwa01t 425 712 sttlwa01t tacwa02t 425 337 tacwa02t';
print join("\n", split(/(?<!\d{3})\s+(?=\d{3})/, $blob)),"\n";
__END__
425 501 sttlwa01t
425 712 sttlwa01t tacwa02t
425 337 tacwa02t

It'll fail if the text entries ever contain 3 numbers at the end.

Not sure if this is prefered or not over the other looping solutions
people have come up with.

-- 
Ben Reser <ben at reser.org>
http://ben.reser.org

"What upsets me is not that you lied to me, but that from now on I can
no longer believe you." -- Nietzsche