[Purdue-pm] Missed Presentation
Mark Senn
mark at ecn.purdue.edu
Wed Apr 11 09:41:17 PDT 2007
> The 54,000% faster lightning talk of Mark's was about using multiple
> simple regex's or-ed together instead of a single complex regex.
> Personally I can't replicate his results but then I don't have his code
> in front of me. He also may have been using an unneeded capture in his
> complex regex.
To get the 54,000% speedup the following was done
o Four regexs that were being done near the end of ~2000 regexes
were moved so they would be at the beginning.
o Right after some of these common cases were eliminated from $_
a "study" command was done to (probably) make the rest of the
~2000 regexes run faster.
o Regexes of the form /a|b|c/ were changed to /a/ || /b/ || /c/.
I didn't do timings for how much difference was due to the different
factors.
Here's a test program to compare /a/ || /b/ vs. /a|b/
===== start with next line
#!/usr/local/bin/perl
$dict = '/usr/dict/words';
@ARGV = $dict;
@word = <>;
print scalar @word, " words read from $dict\n";
use Benchmark;
timethese (
100 => {
'expression or' => '@x = grep /ae/ || /ei/ || /io/ || /ou/ || /uy/, @word',
'expression or o' => '@x = grep /ae/o || /ei/o || /io/o || /ou/o || /uy/o, @word',
'regex or x' => '@x = grep /ae | ei | io | ou | uy/x, @word',
'regex or ox' => '@x = grep /ae | ei | io | ou | uy/ox, @word',
}
);
===== end with previous line
I get
% ./benchmark.pl
25143 words read from /usr/dict/words
Benchmark: timing 100 iterations of expression or, expression or o, regex or ox, regex or x...
expression or: 5 wallclock secs ( 4.85 usr + 0.00 sys = 4.85 CPU) @ 20.62/s (n=100)
expression or o: 5 wallclock secs ( 4.86 usr + 0.00 sys = 4.86 CPU) @ 20.58/s (n=100)
regex or ox: 12 wallclock secs (11.63 usr + 0.01 sys = 11.64 CPU) @ 8.59/s (n=100)
regex or x: 11 wallclock secs (11.60 usr + 0.00 sys = 11.60 CPU) @ 8.62/s (n=100)
-mark
More information about the Purdue-pm
mailing list