SPUG: RE / Split Question

Lee Wilson devnull at devnullsoftware.com
Wed Jul 30 18:53:24 CDT 2003


The problem is that you're trying to do a non-deterministic problem using 
a deterministic tool, I think.

However, based on the small sample of data, I can see an approach that 
will work, albeit a bit bruce-forcish.  


    my $input = "425 501 sttlwa01t 425 712 sttlwa01t tacwa02t 425 337 
tacwa02t";
    my @step1 = ();
    my @output = ();
    my $out;
    my $index = 0;

    @step1 = split /(\d{3}\s+\d{3}\s+)/, $input;
    for ( $index = 1; $index < @step1; $index += 2)
    {
        my $line = $step1[$index] . " " . $step1[$index + 1];
        push( @output, $line );
    }

This assumes:
1) Each piece of data starts with 2 3-digit numbers
2) Each piece of data has 1 or more words

if either of those are untrue, then this algorithm will have problems, but 
I think they are solvable.

My only point is that you may not be able to solve this with a one-line 
command.  If someone CAN solve it that way, I'd be interested to see how 
=)



On Wed, 30 Jul 2003, Orr, Chuck  (NOC) wrote:

> Hello All,
>  
> Please help with the following dilemma:
> 
>      I am being given a glob of data from a web page that I need to fix
> with perl.  It comes in as $blob looking like this:
>  
> 425 501 sttlwa01t 425 712 sttlwa01t tacwa02t 425 337 tacwa02t ...
>  
> I need to break this up so the word characters associated with the
> numbers stay with their numbers.  Ideally, I would have an array like
> this:
>  
> 425 501 sttlwa01t
> 425 712 sttlwa01t tacwa02t
> 425 337 tacwa02t
>  
> As you can see, I am not assured of the number of words that will follow
> each set of numbers.  Could you please suggest a split or some other
> tool that will turn the glob into the fix?
> $new_array = [ split /(?=[A-Z]\s\d)/,$scalar ];  
> 
> Which is as close as we got, does not work.  It keeps the split
> characters, but in a funky way that I cannot deal with.  It also will
> always miss the last chunk of the glob.



==============================================================================
Lee Wilson - INTP                               http://www.devnullsoftware.com
Software Developer / RealNetworks                    http://www.realarcade.com
==============================================================================
              There are 10 kinds of people in the world: 
                The people who understand ternary, 
                  The people who dont, but care, 
            and the people who don't understand or care.
==============================================================================




More information about the spug-list mailing list