[Melbourne-pm] Regular expression musings
Jacinta Richardson
jarich at perltraining.com.au
Wed Jun 4 22:38:26 PDT 2008
Heh. I've just shown that .* and .*? can fail very quickly. I feel so foolish.
(Original data didn't need /s, but I should have put it in anyway). Corrected
file attached and better benchmark results are:
Comparing over a string of length: 2159
Whole string quoted
Rate brackets1 dot_quest1 dot_star1
brackets1 16580/s -- -84% -93%
dot_quest1 102920/s 521% -- -56%
dot_star1 232662/s 1303% 126% --
First word only quoted
Rate dot_star2 brackets2 dot_quest2
dot_star2 46541/s -- -94% -94%
brackets2 744650/s 1500% -- -7%
dot_quest2 804305/s 1628% 8% --
Last word only quoted
Rate brackets3 dot_quest3 dot_star3
brackets3 65944/s -- -2% -2%
dot_quest3 66987/s 2% -- -1%
dot_star3 67626/s 3% 1% --
Single starting quote
Rate dot_star4 dot_quest4 brackets4
dot_star4 192377/s -- -1% -2%
dot_quest4 194869/s 1% -- -0%
brackets4 195491/s 2% 0% --
Paul pointed out that Perl's way smarter than this and makes anchors of static
points before running the regular expression. Which we can see when we turn on
re 'debug':
use re 'debug';
$string1 =~ /"(.*)"/s;
exit;
Compiling REx `"(.*)"'
size 11 Got 92 bytes for offset annotations.
first at 1
1: EXACT <">(3)
3: OPEN1(5)
5: STAR(7)
6: SANY(0)
7: CLOSE1(9)
9: EXACT <">(11)
11: END(0)
anchored """ at 0 floating """ at 1..2147483647 (checking floating) minlen 2
Offsets: [11]
1[1] 0[0] 2[1] 0[0] 4[1] 3[1] 5[1] 0[0] 6[1] 0[0] 7[0]
Guessing start of match, REx ""(.*)"" against ""Lorem ipsum dolor sit amet,
consectetur adipisicing elit, s..."...
Found floating substr """ at offset 2158...
Found anchored substr """ at offset 0...
Guessed: match at offset 0
Matching REx ""(.*)"" against ""Lorem ipsum dolor sit amet, consectetur
adipisicing elit, s..."
Setting an EVAL scope, savestack=14
0 <> <"Lorem ipsum> | 1: EXACT <">
1 <"> <Lorem ipsum> | 3: OPEN1
1 <"> <Lorem ipsum> | 5: STAR
SANY can match 2158 times out of 2147483647...
Setting an EVAL scope, savestack=14
2158 <s repellat.> <"> | 7: CLOSE1
2158 <s repellat.> <"> | 9: EXACT <">
2159 <s repellat."> <> | 11: END
Match successful!
Freeing REx: `"\"(.*)\""'
So Perl jumps to the anchors, and thus is so fast. I'm not sure how this ties
in to the speed differences we see.
All the best,
J
--
("`-''-/").___..--''"`-._ | Jacinta Richardson |
`6_ 6 ) `-. ( ).`-.__.`) | Perl Training Australia |
(_Y_.)' ._ ) `._ `. ``-..-' | +61 3 9354 6001 |
_..`--'_..-_/ /--'_.' ,' | contact at perltraining.com.au |
(il),-'' (li),' ((!.-' | www.perltraining.com.au |
-------------- next part --------------
A non-text attachment was scrubbed...
Name: benchmarking-res.pl
Type: application/x-perl
Size: 10142 bytes
Desc: not available
Url : http://mail.pm.org/pipermail/melbourne-pm/attachments/20080605/7826dc9a/attachment-0001.bin
More information about the Melbourne-pm
mailing list