[Melbourne-pm] An old one but an important one

Fri Jan 18 15:51:18 PST 2008

Hey Dudes,

I was fixing a performance problem with our portal. It was taking 2 or  
3 minutes to render an HTML form. I narrowed it down to the fact that  
this particular form had a 500K XML file to parse, and it did it 10  
times. If the duplicate parsing and size wasn't bad enough, it was the  
hidden problem.

Making command line code that did EXACTLY the same thing (so I  
thought) it ran in sub 1 second. So what was going on.

After a full day of debugging I narrowed it down to one module  
Filter::Simple, only it wasn't, it was Text::Balanced

And after even more work I found the real problem - "$&" after a  
regular expression.

Yep - it is a known killer of regular expression performance, but here  
is the actual differences:

(env53) vmwmi1:~/simple# time perl test_direct.pl
	Fake loop start - size 488604
	Fake loop end - for 21581
	real	0m26.586s
	user	0m9.305s
	sys	0m17.269s

(env53) vmwmi1:~/simple# time perl test_direct.pl
	Fake loop start - size 488604
	Fake loop end - for 21581
	real	0m0.131s
	user	0m0.112s
	sys	0m0.012s

Yep, even including compile time, and reading the file from disk - the  
time goes from 0.012s to 26.58 second - that is 2000 times slower !

So... take heed !

	Don't use $&, $` or $'

But more importantly - check the modules you use on CPAN.

Scooter
P.S. I have removed all uses in our code, and still it has a problem,  
so I suspect there is yet another CPAN module using one.