[San-Diego-pm] Recompiling global substitution RE?
Anthony Foiani
tkil at scrye.com
Sun Mar 21 01:19:00 PDT 2010
Reuben Settergren <ruberad at gmail.com> writes:
> [...] is there any way I can force the *left* side to compile
> multiple times? (Even within the same /g substitution?) I tried a
> few constructs using the \G (start matching from previous position)
> zero-width assertion, but only managed to get myself into infinite
> loops.
It's doable, but it gets ugly to do it flexibly.
First, a simple (hardcoded) proof of concept:
| #!/usr/bin/perl
|
| use warnings;
| use strict;
|
| my $orig = "foo1 foo2 foo2 foo3 foo4 foo2 foo5";
| print "orig: $orig\n";
|
| my $done = '';
| my $n = 1;
| while ( $orig =~ m!\G(.*?)(foo$n)!gc )
| {
| $done .= $1;
| $done .= 'bar' . $n if defined $2;
| ++$n;
| }
|
| print "done: $done\n";
|
| exit 0;
(Online at: http://scrye.com/~tkil/perl/simple-recomp-replace.plx )
Output:
| $ ./simple-recomp-replace.plx
| orig: foo1 foo2 foo2 foo3 foo4 foo2 foo5
| done: bar1 bar2 foo2 bar3 bar4 foo2 bar5
Making that generic... I probably didn't do a great job. It might be
cleaner if it's implemented as a class; I don't know. Here's what I
came up with:
| #!/usr/bin/perl
|
| use warnings;
| use strict;
|
| =over
|
| =item my $done = recomp_replace $orig, $sub_ref;
|
| This function provides a relatively generic way to do a global search
| and replace while allowing the matching regex to vary.
|
| =over
|
| =item $orig
|
| This is the original string.
|
| =item $sub_ref
|
| This is a callback which is fed each matching portion of the $orig
| string. Its return value is a list of two items: first, the text to
| substitute into the final string; and second, the regex to use to
| match the next chunk.
|
| A special case is when this is called without any arguments; this is
| used to obtain the starting regex (and the return text is discarded).
|
| =back
|
| Example:
|
| my $orig = "foo1 foo2 foo2 foo3 foo4 foo2 foo5";
| print "orig: $orig\n";
|
| my $n = 0;
| sub numbered_foo
| {
| my ( $chunk ) = @_;
| return ( "bar" . $n++, "foo$n" );
| }
|
| my $done = recomp_replace $orig, \&numbered_foo;
| print "done: $done\n";
|
| This generates the following output:
|
| orig: foo1 foo2 foo2 foo3 foo4 foo2 foo5 baz
| done: bar1 bar2 foo2 bar3 bar4 foo2 bar5 baz
|
| For a more complicated example, consider a mini-language that reads in
| a mixed list of words and simple directives. This time, the output
| first (spacing has been manually adjusted for clarity):
|
| orig: one=>alpha two one three one two=>beta one two three
| done: alpha two alpha three alpha beta one beta three
|
| And here's the engine behind it:
|
| my $map_re = qr/ ( \w+ ) => ( \w+ ) /x;
| my $last_re = $map_re;
| my ( $src, $dest );
| sub selective_map
| {
| my ( $chunk ) = @_;
| my $out;
| if ( ! defined $chunk )
| {
| $out = '';
| }
| elsif ( defined $src && $chunk eq $src )
| {
| $out = $dest;
| }
| elsif ( $chunk =~ /^$map_re$/ )
| {
| ( $src, $dest ) = ( $1, $2 );
| $last_re = qr/ $map_re | $src /x;
| $out = $dest;
| }
| return ( $out, $last_re );
| }
|
| =back
|
| =cut
|
| sub recomp_replace
| {
| my ( $orig, $block ) = @_;
|
| my $done = '';
| my ( $chunk, $re ) = $block->();
| # print "re='$re'\n";
| my $pos = 0;
| while ( $orig =~ m! \G ( .*? ) ( $re ) !gcx )
| {
| $done .= $1;
| ( $chunk, $re ) = $block->( $2 );
| # print "chunk='$chunk', re='$re'\n";
| $done .= $chunk;
| $pos = $+[0];
| }
| $done .= substr $orig, $pos;
| return $done;
| }
|
| # first example
| {
| my $orig = "foo1 foo2 foo2 foo3 foo4 foo2 foo5 baz";
| print "orig: $orig\n";
|
| my $n = 0;
| sub numbered_foo
| {
| my ( $chunk ) = @_;
| return ( "bar" . $n++, "foo$n" );
| }
|
| my $done = recomp_replace $orig, \&numbered_foo;
| print "done: $done\n";
| }
|
| # second example
| {
| my $orig = "one=>alpha two one three one two=>beta one two three";
| print "orig: $orig\n";
|
| my $map_re = qr/ ( \w+ ) => ( \w+ ) /x;
| my $last_re = $map_re;
| my ( $src, $dest );
| sub selective_map
| {
| my ( $chunk ) = @_;
| my $out;
| if ( ! defined $chunk )
| {
| $out = '';
| }
| elsif ( defined $src && $chunk eq $src )
| {
| $out = $dest;
| }
| elsif ( $chunk =~ /^$map_re$/ )
| {
| ( $src, $dest ) = ( $1, $2 );
| $last_re = qr/ $map_re | $src /x;
| $out = $dest;
| }
| return ( $out, $last_re );
| }
|
| my $done = recomp_replace $orig, \&selective_map;
| print "done: $done\n";
| }
|
| exit 0;
(Online at: http://scrye.com/~tkil/perl/recomp-replace.plx )
Hm. Just realized that the second version will prolly explode if the
regex returned from the callback ever matches zero characters (since
my "get stuff before the RE" is non-greedy...) Oh well. Don't do
that, then.
"Share and Enjoy",
t.
More information about the San-Diego-pm
mailing list