[Melbourne-pm] Regular expression musings
Ryan, Martin G
Martin.G.Ryan at team.telstra.com
Wed Jun 4 23:15:15 PDT 2008
Maybe a bit out of the scope of a tip, but if your going to do lots of different RE's on a string, the "study" function might help.
-----Original Message-----
From: melbourne-pm-bounces+martin.g.ryan=team.telstra.com at pm.org [mailto:melbourne-pm-bounces+martin.g.ryan=team.telstra.com at pm.org] On Behalf Of Alfie John
Sent: Thursday, 5 June 2008 2:47 PM
To: Jacinta Richardson
Cc: Melbourne Perlmongers
Subject: Re: [Melbourne-pm] Regular expression musings
Hey Jacinta,
If you're talking about optimisations, I think it would also be a good idea to include the tip - "don't use regular expressions". Using the same samples, I would be interested in seeing the results of a
index()/substr() combo.
Alfie
On Thu, Jun 5, 2008 at 2:34 PM, Jacinta Richardson <jarich at perltraining.com.au> wrote:
> I'm writing a Perl tip ( http://perltraining.com.au/tips/ ) about
> regular expression optimisations - the usual ones, and decided to
> benchmark some with respect to .*, .*? and alternatives.
>
> I considered the case of matching a string inside double quotes. I
> used one very long string copied three times with:
>
> * the whole string in quotes (1)
> * only the first word in quotes (2)
> * only the last word in quotes (3)
> * only one double quote (4)
>
> I figured that this covered all of my options since the regular engine
> would halt as soon as the second double quote was found. I also used
> the following
> expressions:
>
> /".*"/ (dot_star)
> /".*?"/ (dot_quest)
> /"[^"]"/ (brackets)
>
> What I expected to find was:
>
> * dot_star clearly fastest on 1
> * dot_quest slower in general than brackets
> * brackets fastest on 2 and 4 (followed closely by dot_quest)
> * no significant time difference on 3
> * very significant time difference between dot_quest and the
> others on 4
>
> What I found instead was:
>
> * dot_star fastest on 1, 4
> * dot_quest fastest on 2, 3
> * brackets never fastest
> * no significant time difference between any of them on 3, 4.
>
> The benchmarking results are:
>
> Comparing over a string of length: 2159 Whole string quoted
> Rate brackets1 dot_quest1 dot_star1
> brackets1 17958/s -- -75% -83%
> dot_quest1 71522/s 298% -- -33%
> dot_star1 107222/s 497% 50% --
>
> First word only quoted
> Rate dot_star2 brackets2 dot_quest2
> dot_star2 562469/s -- -31% -40%
> brackets2 813620/s 45% -- -13%
> dot_quest2 936418/s 66% 15% --
>
> Last word only quoted
> Rate brackets3 dot_star3 dot_quest3
> brackets3 68064/s -- -1% -2%
> dot_star3 68713/s 1% -- -1%
> dot_quest3 69176/s 2% 1% --
>
> Single starting quote
> Rate dot_quest4 brackets4 dot_star4
> dot_quest4 203988/s -- -0% -1%
> brackets4 204852/s 0% -- -1%
> dot_star4 206853/s 1% 1% --
>
>
> This surprises me. I expected it to take more time for .*? to take
> something, try to match the ", fail and repeat; than to just compare a
> character to a bit map, consume it and repeat. I certainly didn't
> expect .*? to be 300% faster than [^"]* over a long string. I'm
> particularly surprised to see .*? be 15% faster than [^"]* over a
> string of 5 characters. This seems even more unusual because it's not that much slower over the 9 characters at the end of the string.
>
> My benchmarking code is attached. Can anyone spot any issues which
> might be influencing these results?
>
> All the best,
>
> Jacinta
>
> --
> ("`-''-/").___..--''"`-._ | Jacinta Richardson |
> `6_ 6 ) `-. ( ).`-.__.`) | Perl Training Australia |
> (_Y_.)' ._ ) `._ `. ``-..-' | +61 3 9354 6001 |
> _..`--'_..-_/ /--'_.' ,' | contact at perltraining.com.au |
> (il),-'' (li),' ((!.-' | www.perltraining.com.au |
>
> _______________________________________________
> Melbourne-pm mailing list
> Melbourne-pm at pm.org
> http://mail.pm.org/mailman/listinfo/melbourne-pm
>
_______________________________________________
Melbourne-pm mailing list
Melbourne-pm at pm.org
http://mail.pm.org/mailman/listinfo/melbourne-pm
More information about the Melbourne-pm
mailing list