[Melbourne-pm] Regular expression musings
Alfie John
alfiejohn at gmail.com
Wed Jun 4 21:47:26 PDT 2008
Hey Jacinta,
If you're talking about optimisations, I think it would also be a good
idea to include the tip - "don't use regular expressions". Using the
same samples, I would be interested in seeing the results of a
index()/substr() combo.
Alfie
On Thu, Jun 5, 2008 at 2:34 PM, Jacinta Richardson
<jarich at perltraining.com.au> wrote:
> I'm writing a Perl tip ( http://perltraining.com.au/tips/ ) about regular
> expression optimisations - the usual ones, and decided to benchmark some with
> respect to .*, .*? and alternatives.
>
> I considered the case of matching a string inside double quotes. I used one
> very long string copied three times with:
>
> * the whole string in quotes (1)
> * only the first word in quotes (2)
> * only the last word in quotes (3)
> * only one double quote (4)
>
> I figured that this covered all of my options since the regular engine would
> halt as soon as the second double quote was found. I also used the following
> expressions:
>
> /".*"/ (dot_star)
> /".*?"/ (dot_quest)
> /"[^"]"/ (brackets)
>
> What I expected to find was:
>
> * dot_star clearly fastest on 1
> * dot_quest slower in general than brackets
> * brackets fastest on 2 and 4 (followed closely by dot_quest)
> * no significant time difference on 3
> * very significant time difference between dot_quest and the others on 4
>
> What I found instead was:
>
> * dot_star fastest on 1, 4
> * dot_quest fastest on 2, 3
> * brackets never fastest
> * no significant time difference between any of them on 3, 4.
>
> The benchmarking results are:
>
> Comparing over a string of length: 2159
> Whole string quoted
> Rate brackets1 dot_quest1 dot_star1
> brackets1 17958/s -- -75% -83%
> dot_quest1 71522/s 298% -- -33%
> dot_star1 107222/s 497% 50% --
>
> First word only quoted
> Rate dot_star2 brackets2 dot_quest2
> dot_star2 562469/s -- -31% -40%
> brackets2 813620/s 45% -- -13%
> dot_quest2 936418/s 66% 15% --
>
> Last word only quoted
> Rate brackets3 dot_star3 dot_quest3
> brackets3 68064/s -- -1% -2%
> dot_star3 68713/s 1% -- -1%
> dot_quest3 69176/s 2% 1% --
>
> Single starting quote
> Rate dot_quest4 brackets4 dot_star4
> dot_quest4 203988/s -- -0% -1%
> brackets4 204852/s 0% -- -1%
> dot_star4 206853/s 1% 1% --
>
>
> This surprises me. I expected it to take more time for .*? to take something,
> try to match the ", fail and repeat; than to just compare a character to a bit
> map, consume it and repeat. I certainly didn't expect .*? to be 300% faster
> than [^"]* over a long string. I'm particularly surprised to see .*? be 15%
> faster than [^"]* over a string of 5 characters. This seems even more unusual
> because it's not that much slower over the 9 characters at the end of the string.
>
> My benchmarking code is attached. Can anyone spot any issues which might be
> influencing these results?
>
> All the best,
>
> Jacinta
>
> --
> ("`-''-/").___..--''"`-._ | Jacinta Richardson |
> `6_ 6 ) `-. ( ).`-.__.`) | Perl Training Australia |
> (_Y_.)' ._ ) `._ `. ``-..-' | +61 3 9354 6001 |
> _..`--'_..-_/ /--'_.' ,' | contact at perltraining.com.au |
> (il),-'' (li),' ((!.-' | www.perltraining.com.au |
>
> _______________________________________________
> Melbourne-pm mailing list
> Melbourne-pm at pm.org
> http://mail.pm.org/mailman/listinfo/melbourne-pm
>
More information about the Melbourne-pm
mailing list