[Pdx-pm] string comparison vs hash

Thomas Keller kellert at ohsu.edu
Tue May 29 21:51:08 PDT 2007


I thought that I got around that problem by using three different  
file handles, one for each of the three compare subroutines. But it  
seemed worth testing. I commented out everything but the $fh->open()  
statement; then I added the file read method; and finally the process  
the lines method. Here are the numbers:
$ perl benchmark_hash_vs_grep
▼	❑	A.	Perl Benchmark.pm examples
	•	❑	1.	open 3 filehandles sequentially  
			 Rate       with_hash with_string_cmp       with_grep
with_hash       33998/s              --             -1%             -2%
with_string_cmp 34296/s              1%              --             -1%
with_grep       34600/s              2%              1%              --
	•	❑	2.	open and read: slurp into an array (@lines = <$fh>)  vs  
while (<$fh>) { }  
                    Rate with_string_cmp       with_grep       with_hash
with_string_cmp  6140/s              --             -1%            -39%
with_grep        6178/s              1%              --            -39%
with_hash       10049/s             64%             63%              --

	•	❑	3.	open, read, and process lines  
                   Rate       with_grep with_string_cmp       with_hash
with_grep        169/s              --            -87%            -90%
with_string_cmp 1297/s            667%              --            -25%
with_hash       1723/s            918%             33%              --

1. Using separate fh's seems to have avoided the problem of advantage  
due to order (cache vs fresh read).
2. The while (<$fh>) { do nothing } (the 'with_hash' approach) beats  
the slurp into an array read method, used by the other two, quite  
handily.
3. The hash method continues to kick hash vs the string compare  
method, and the grep method is not even close.

Thanks for your help Eric and chromatic. This was a really useful  
(and fun) exercise for a perennial beginner like myself.

regards,
Tom K

On May 29, 2007, at 3:54 PM, chromatic wrote:

> On Tuesday 29 May 2007 15:35:07 Austin Schutz wrote:
>
>> 	You are using different file reading techniques. That could be
>> _very_ significant. If you are going to slurp all the lines for the
>> string comparison you should do the same for the hash.
>
> Worse than that, the first code to read the file pays the penalty of
> populating file buffers.  Subsequent reads probably all come from a  
> warm
> cache.
>
> Mixing IO with benchmarks usually skews the results heavily.
>
> -- c
> _______________________________________________
> Pdx-pm-list mailing list
> Pdx-pm-list at pm.org
> http://mail.pm.org/mailman/listinfo/pdx-pm-list
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.pm.org/pipermail/pdx-pm-list/attachments/20070529/776b9712/attachment-0004.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: benchmark_filehandles
Type: application/octet-stream
Size: 3738 bytes
Desc: not available
Url : http://mail.pm.org/pipermail/pdx-pm-list/attachments/20070529/776b9712/attachment-0003.obj 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.pm.org/pipermail/pdx-pm-list/attachments/20070529/776b9712/attachment-0005.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: benchmark_read_methods
Type: application/octet-stream
Size: 3735 bytes
Desc: not available
Url : http://mail.pm.org/pipermail/pdx-pm-list/attachments/20070529/776b9712/attachment-0004.obj 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.pm.org/pipermail/pdx-pm-list/attachments/20070529/776b9712/attachment-0006.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: benchmark_hash_vs_grep
Type: application/octet-stream
Size: 3493 bytes
Desc: not available
Url : http://mail.pm.org/pipermail/pdx-pm-list/attachments/20070529/776b9712/attachment-0005.obj 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.pm.org/pipermail/pdx-pm-list/attachments/20070529/776b9712/attachment-0007.html 


More information about the Pdx-pm-list mailing list