[Pdx-pm] oh, gross (object method in regex)

Eric Wilhelm scratchcomputing at gmail.com
Sun Mar 12 18:21:58 PST 2006


# from Randall Hansen
# on Sunday 12 March 2006 04:20 pm:

(snippets from your next e-mail for correctness)

Note that what you called temp was actually the one using the scalar 
deref construct.

(renamed "temp" to "scalar reference")
>     sub sref {
...
>         return grep /${ \$Foo->foo }/ => @search;

(renamed "eric correct" to "array ref")
>     sub aref {
...
>         return grep /@{[ $Foo->foo ]}/ => @search;

(renamed "deref" to "real temp")
>     sub rtmp {
...
>         my $foo = $Foo->foo;
>         return grep /$foo/ => @search;

And just the important numbers here:
> rtmp:  2 secs @ 68k/s
> aref:  4 secs @ 34k/s
> sref:  2 secs @ 46k/s

>david's method of assigning to a temporary variable works,  
>and is what i've done before, but seemed ugly and wasteful because i  
>only used it once.

Not only is it important to benchmark, it is really important to 
benchmark correctly :-)

>so the reference/dereference syntax avoids the temporary variable, is
>   faster[1], and explicit enough so that people who understand the
> rest of my code will get it.

Let's be clear what the four forms in your benchmark are.

The original "eric" sub is going to yield incorrect results because the 
backslash turns it into /@{[\($Foo->foo)];}/ when you run deparse on it 
(that's a list of one reference to a scalar once it gets captured in 
the [] array referenced and flattened by the @{} cyclops.)  So, best to 
just throw that away and pretend we never saw it, since fast or slow 
incorrect behavior is irrelevant.

The "eric_correct" sub is an array dereference construct, as hinted at 
by my above renaming to "aref".

The one you called "temp" is actually the scalar dereference construct.  
I would expect that this is faster than the array dereference by at 
least a little because the code is following a "one value" path through 
perl rather than a list path.

Finally, the one you called "deref" is using a temp variable ("rtmp" 
above.)

Note that the temp variable is about 3/2 the speed of the scalar 
dereference and twice as fast as the and twice the speed of the array 
dereference.

Why is a temp variable faster?  Feel free to play with B::Concise and 
post the pertinent snippets of the optree here when you find them.  The 
lazy find something to blame and move on (Schwern called this the "User 
Model" if you remember his talk on design.)

I was going to choose the garbage collector as my straw man.  Seems that 
pass-by-value to a nearby lexical would at least be easier to keep 
track of than an anonymous reference inside a regex.

But, hey!

$ perl -e 'my $obj = "main";
  sub foo {warn "hey\n"; "thing"};
  print grep(/${\($obj->foo)}/, "a thing", "deal", "stuff");'
hey
hey
hey
a thing

Temp variable pops you out of the need to call the method every time, so 
if you increase the size of @search, your numbers are going to get a 
lot worse.  Did you guess that would happen?  I sure didn't!

--Eric
-- 
Turns out the optimal technique is to put it in reverse and gun it.
--Steven Squyres (on challenges in interplanetary robot navigation)
---------------------------------------------------
    http://scratchcomputing.com
---------------------------------------------------


More information about the Pdx-pm-list mailing list