SPUG: RE / Split Question
Lee Wilson
devnull at devnullsoftware.com
Wed Jul 30 18:53:24 CDT 2003
The problem is that you're trying to do a non-deterministic problem using
a deterministic tool, I think.
However, based on the small sample of data, I can see an approach that
will work, albeit a bit bruce-forcish.
my $input = "425 501 sttlwa01t 425 712 sttlwa01t tacwa02t 425 337
tacwa02t";
my @step1 = ();
my @output = ();
my $out;
my $index = 0;
@step1 = split /(\d{3}\s+\d{3}\s+)/, $input;
for ( $index = 1; $index < @step1; $index += 2)
{
my $line = $step1[$index] . " " . $step1[$index + 1];
push( @output, $line );
}
This assumes:
1) Each piece of data starts with 2 3-digit numbers
2) Each piece of data has 1 or more words
if either of those are untrue, then this algorithm will have problems, but
I think they are solvable.
My only point is that you may not be able to solve this with a one-line
command. If someone CAN solve it that way, I'd be interested to see how
=)
On Wed, 30 Jul 2003, Orr, Chuck (NOC) wrote:
> Hello All,
>
> Please help with the following dilemma:
>
> I am being given a glob of data from a web page that I need to fix
> with perl. It comes in as $blob looking like this:
>
> 425 501 sttlwa01t 425 712 sttlwa01t tacwa02t 425 337 tacwa02t ...
>
> I need to break this up so the word characters associated with the
> numbers stay with their numbers. Ideally, I would have an array like
> this:
>
> 425 501 sttlwa01t
> 425 712 sttlwa01t tacwa02t
> 425 337 tacwa02t
>
> As you can see, I am not assured of the number of words that will follow
> each set of numbers. Could you please suggest a split or some other
> tool that will turn the glob into the fix?
> $new_array = [ split /(?=[A-Z]\s\d)/,$scalar ];
>
> Which is as close as we got, does not work. It keeps the split
> characters, but in a funky way that I cannot deal with. It also will
> always miss the last chunk of the glob.
==============================================================================
Lee Wilson - INTP http://www.devnullsoftware.com
Software Developer / RealNetworks http://www.realarcade.com
==============================================================================
There are 10 kinds of people in the world:
The people who understand ternary,
The people who dont, but care,
and the people who don't understand or care.
==============================================================================
More information about the spug-list
mailing list