[Melbourne-pm] Regular expression musings

Alfie John alfiejohn at gmail.com
Wed Jun 4 21:47:26 PDT 2008


Hey Jacinta,

If you're talking about optimisations, I think it would also be a good
idea to include the tip - "don't use regular expressions". Using the
same samples, I would be interested in seeing the results of a
index()/substr() combo.

Alfie

On Thu, Jun 5, 2008 at 2:34 PM, Jacinta Richardson
<jarich at perltraining.com.au> wrote:
> I'm writing a Perl tip ( http://perltraining.com.au/tips/ ) about regular
> expression optimisations - the usual ones, and decided to benchmark some with
> respect to .*, .*? and alternatives.
>
> I considered the case of matching a string inside double quotes.  I used one
> very long string copied three times with:
>
>        * the whole string in quotes    (1)
>        * only the first word in quotes (2)
>        * only the last word in quotes  (3)
>        * only one double quote         (4)
>
> I figured that this covered all of my options since the regular engine would
> halt as soon as the second double quote was found.  I also used the following
> expressions:
>
>        /".*"/          (dot_star)
>        /".*?"/         (dot_quest)
>        /"[^"]"/        (brackets)
>
> What I expected to find was:
>
>        * dot_star clearly fastest on 1
>        * dot_quest slower in general than brackets
>        * brackets fastest on 2 and 4 (followed closely by dot_quest)
>        * no significant time difference on 3
>        * very significant time difference between dot_quest and the others on 4
>
> What I found instead was:
>
>        * dot_star fastest on 1, 4
>        * dot_quest fastest on 2, 3
>        * brackets never fastest
>        * no significant time difference between any of them on 3,  4.
>
> The benchmarking results are:
>
> Comparing over a string of length: 2159
> Whole string quoted
>               Rate  brackets1 dot_quest1  dot_star1
> brackets1   17958/s         --       -75%       -83%
> dot_quest1  71522/s       298%         --       -33%
> dot_star1  107222/s       497%        50%         --
>
> First word only quoted
>               Rate  dot_star2  brackets2 dot_quest2
> dot_star2  562469/s         --       -31%       -40%
> brackets2  813620/s        45%         --       -13%
> dot_quest2 936418/s        66%        15%         --
>
> Last word only quoted
>              Rate  brackets3  dot_star3 dot_quest3
> brackets3  68064/s         --        -1%        -2%
> dot_star3  68713/s         1%         --        -1%
> dot_quest3 69176/s         2%         1%         --
>
> Single starting quote
>               Rate dot_quest4  brackets4  dot_star4
> dot_quest4 203988/s         --        -0%        -1%
> brackets4  204852/s         0%         --        -1%
> dot_star4  206853/s         1%         1%         --
>
>
> This surprises me.  I expected it to take more time for .*? to take something,
> try to match the ", fail and repeat; than to just compare a character to a bit
> map, consume it and repeat.  I certainly didn't expect .*? to be 300% faster
> than [^"]* over a long string.  I'm particularly surprised to see .*? be 15%
> faster than [^"]* over a string of 5 characters.  This seems even more unusual
> because it's not that much slower over the 9 characters at the end of the string.
>
> My benchmarking code is attached.  Can anyone spot any issues which might be
> influencing these results?
>
> All the best,
>
>        Jacinta
>
> --
>   ("`-''-/").___..--''"`-._          |  Jacinta Richardson         |
>    `6_ 6  )   `-.  (     ).`-.__.`)  |  Perl Training Australia    |
>    (_Y_.)'  ._   )  `._ `. ``-..-'   |      +61 3 9354 6001        |
>  _..`--'_..-_/  /--'_.' ,'           | contact at perltraining.com.au |
>  (il),-''  (li),'  ((!.-'             |   www.perltraining.com.au   |
>
> _______________________________________________
> Melbourne-pm mailing list
> Melbourne-pm at pm.org
> http://mail.pm.org/mailman/listinfo/melbourne-pm
>


More information about the Melbourne-pm mailing list