[ABE.pm] perl 5.10 rules! - pluggable regex engines

Ricardo SIGNES rjbs-perl-abe at lists.manxome.org
Tue Jul 10 17:28:48 PDT 2007


Okay, this is probably the last thing I'll write about with regard to perl
5.10, at least as far as I can predict from where I'm sitting now.  It's also
the one I've put off the longest, because it's fairly advanced and esoteric.

Rather than get into the scary esoteric bits (at least right off the bat),
let's just see what I mean and what it can do for you.

Everybody who programs Perl (or even PERL) has probably written something like
this:

  $line =~ /your (face|belly) is very round/;
  $part = $1;

  punch_in_the($part);

There is nothing mysterious about this, I hope.  We match a string against a
pattern, make use of part of the captured data in the match, and that's that.

The semantics of Perl regular expressions are pretty well defined, and are
often considered some of the best (due to both power and relative ease of use)
around.  That doesn't mean, though, that they're the only semantics you ever
need to use.

Perl is great for generating lots of text based on a little text.  Sometimes we
call it templating.  One of the ways I use this is in configuration generation.
I provide just a little data to perl and it generates a big verbose config file
for me.

Let's assume that part of the config you're outputting is regular expressions.

You have code something like this:

  sub mail_from_re {
    my ($sender) = @_;

    my $regex = ...;

    return $regexs
  }

That code returns a string that goes into the config file, where it's used as a
regex.  It's easy to write automated tests for this code, right?  You can write
a test that does:

  my $re_string = mail_from_re($sender);

  like($whatever, qr/$re_string/, "the returned regex matches!");

Now you can test that all your regex are going to work the way you think.

There's just one problem... the tool you're writing for uses POSIX regex, not
Perl regex.  You can't just test with that tool, because it does Stuff that you
can't just do without consequences.  Anyway, it's a pain.

The pluggable regex engine of 5.10 to the rescue!  Now you can write:

  my $re_string = mail_from_re($sender);

  {
    use re::engine::POSIX;
    like($whatever, qr/$re_string/, "the returned regex matches!");
  }

...and like that, you're using POSIX regex.  Or Plan9.  Or PCRE.  Or
JavaScript.  Or Perl6.  Or whatever!

The reverse works, too.

  my $re_string = read_from_external_config_file;

  my $regex;
  { use re::engine::PCRE; $regex = qr/$re_string/; }

  if ($string =~ $regex) {
    ...
  }

Now you can read in the regex from external sources, written for other forms of
regex evaluation, and use them correctly in your perl program.

If these applications don't excite you, and you want to do something really
fun, why not write your own regex engine?  It's easy with re::engine::Plugin,
which lets you write new regex engines in Perl.  I think we'll see a lot of
applications of this for all manner of shortcut:

  use re::engine::Subnet;

  if ('10.1.2.15' =~ m{10.1.2.0/18}) {
    # executed if ip is in described netblock
  }

The API for writing your own engines is still pretty young, and I'm still a
real novice with it, so I won't get into it here.  It's on the CPAN:

  http://search.cpan.org/dist/re-engine-Plugin/

I'm really looking forward to seeing this put to powerful and fun uses,
especially the implementation of Perl 6 Grammars in Perl 5.  More on that
another time!

-- 
rjbs


More information about the ABE-pm mailing list