[Melbourne-pm] Regular expression musings

Ryan, Martin G Martin.G.Ryan at team.telstra.com
Wed Jun 4 23:15:15 PDT 2008


Maybe a bit out of the scope of a tip, but if your going to do lots of different RE's on a string, the "study" function might help.

-----Original Message-----
From: melbourne-pm-bounces+martin.g.ryan=team.telstra.com at pm.org [mailto:melbourne-pm-bounces+martin.g.ryan=team.telstra.com at pm.org] On Behalf Of Alfie John
Sent: Thursday, 5 June 2008 2:47 PM
To: Jacinta Richardson
Cc: Melbourne Perlmongers
Subject: Re: [Melbourne-pm] Regular expression musings

Hey Jacinta,

If you're talking about optimisations, I think it would also be a good idea to include the tip - "don't use regular expressions". Using the same samples, I would be interested in seeing the results of a
index()/substr() combo.

Alfie

On Thu, Jun 5, 2008 at 2:34 PM, Jacinta Richardson <jarich at perltraining.com.au> wrote:
> I'm writing a Perl tip ( http://perltraining.com.au/tips/ ) about
> regular expression optimisations - the usual ones, and decided to
> benchmark some with respect to .*, .*? and alternatives.
>
> I considered the case of matching a string inside double quotes.  I
> used one very long string copied three times with:
>
>        * the whole string in quotes    (1)
>        * only the first word in quotes (2)
>        * only the last word in quotes  (3)
>        * only one double quote         (4)
>
> I figured that this covered all of my options since the regular engine
> would halt as soon as the second double quote was found.  I also used
> the following
> expressions:
>
>        /".*"/          (dot_star)
>        /".*?"/         (dot_quest)
>        /"[^"]"/        (brackets)
>
> What I expected to find was:
>
>        * dot_star clearly fastest on 1
>        * dot_quest slower in general than brackets
>        * brackets fastest on 2 and 4 (followed closely by dot_quest)
>        * no significant time difference on 3
>        * very significant time difference between dot_quest and the
> others on 4
>
> What I found instead was:
>
>        * dot_star fastest on 1, 4
>        * dot_quest fastest on 2, 3
>        * brackets never fastest
>        * no significant time difference between any of them on 3,  4.
>
> The benchmarking results are:
>
> Comparing over a string of length: 2159 Whole string quoted
>               Rate  brackets1 dot_quest1  dot_star1
> brackets1   17958/s         --       -75%       -83%
> dot_quest1  71522/s       298%         --       -33%
> dot_star1  107222/s       497%        50%         --
>
> First word only quoted
>               Rate  dot_star2  brackets2 dot_quest2
> dot_star2  562469/s         --       -31%       -40%
> brackets2  813620/s        45%         --       -13%
> dot_quest2 936418/s        66%        15%         --
>
> Last word only quoted
>              Rate  brackets3  dot_star3 dot_quest3
> brackets3  68064/s         --        -1%        -2%
> dot_star3  68713/s         1%         --        -1%
> dot_quest3 69176/s         2%         1%         --
>
> Single starting quote
>               Rate dot_quest4  brackets4  dot_star4
> dot_quest4 203988/s         --        -0%        -1%
> brackets4  204852/s         0%         --        -1%
> dot_star4  206853/s         1%         1%         --
>
>
> This surprises me.  I expected it to take more time for .*? to take
> something, try to match the ", fail and repeat; than to just compare a
> character to a bit map, consume it and repeat.  I certainly didn't
> expect .*? to be 300% faster than [^"]* over a long string.  I'm
> particularly surprised to see .*? be 15% faster than [^"]* over a
> string of 5 characters.  This seems even more unusual because it's not that much slower over the 9 characters at the end of the string.
>
> My benchmarking code is attached.  Can anyone spot any issues which
> might be influencing these results?
>
> All the best,
>
>        Jacinta
>
> --
>   ("`-''-/").___..--''"`-._          |  Jacinta Richardson         |
>    `6_ 6  )   `-.  (     ).`-.__.`)  |  Perl Training Australia    |
>    (_Y_.)'  ._   )  `._ `. ``-..-'   |      +61 3 9354 6001        |
>  _..`--'_..-_/  /--'_.' ,'           | contact at perltraining.com.au |
>  (il),-''  (li),'  ((!.-'             |   www.perltraining.com.au   |
>
> _______________________________________________
> Melbourne-pm mailing list
> Melbourne-pm at pm.org
> http://mail.pm.org/mailman/listinfo/melbourne-pm
>
_______________________________________________
Melbourne-pm mailing list
Melbourne-pm at pm.org
http://mail.pm.org/mailman/listinfo/melbourne-pm


More information about the Melbourne-pm mailing list