[Purdue-pm] Missed Presentation

Mark Senn mark at ecn.purdue.edu
Wed Apr 11 09:41:17 PDT 2007


  > The 54,000% faster lightning talk of Mark's was about using multiple 
  > simple regex's or-ed together instead of a single complex regex.  
  > Personally I can't replicate his results but then I don't have his code 
  > in front of me.  He also may have been using an unneeded capture in his 
  > complex regex.

To get the 54,000% speedup the following was done
    o  Four regexs that were being done near the end of ~2000 regexes
       were moved so they would be at the beginning.
    o  Right after some of these common cases were eliminated from $_
       a "study" command was done to (probably) make the rest of the
       ~2000 regexes run faster.
    o  Regexes of the form /a|b|c/ were changed to /a/ || /b/ || /c/.

I didn't do timings for how much difference was due to the different
factors.

Here's a test program to compare   /a/ || /b/   vs.   /a|b/

===== start with next line
#!/usr/local/bin/perl

$dict = '/usr/dict/words';
@ARGV = $dict;
@word = <>;
print scalar @word, " words read from $dict\n";

use Benchmark;

timethese (
    100 => {
        'expression or'    =>  '@x = grep /ae/  || /ei/  || /io/  || /ou/  || /uy/,   @word',
        'expression or o'  =>  '@x = grep /ae/o || /ei/o || /io/o || /ou/o || /uy/o,  @word',
        'regex or x'       =>  '@x = grep /ae   |   ei   |   io   |   ou   |   uy/x,  @word',
        'regex or ox'      =>  '@x = grep /ae   |   ei   |   io   |   ou   |   uy/ox, @word',
    }
);
===== end with previous line

I get
% ./benchmark.pl
25143 words read from /usr/dict/words
Benchmark: timing 100 iterations of expression or, expression or o, regex or ox, regex or x...
expression or:  5 wallclock secs ( 4.85 usr +  0.00 sys =  4.85 CPU) @ 20.62/s (n=100)
expression or o:  5 wallclock secs ( 4.86 usr +  0.00 sys =  4.86 CPU) @ 20.58/s (n=100)
regex or ox: 12 wallclock secs (11.63 usr +  0.01 sys = 11.64 CPU) @  8.59/s (n=100)
regex or x: 11 wallclock secs (11.60 usr +  0.00 sys = 11.60 CPU) @  8.62/s (n=100)

-mark


More information about the Purdue-pm mailing list