[Pdx-pm] string comparison vs hash

Marvin Humphrey marvin at rectangular.com
Wed May 30 05:52:54 PDT 2007


On May 29, 2007, at 11:58 PM, chromatic wrote:

> Still, if the benchmark's fast enough to say "I didn't run enough  
> iterations
> to get a reliable count", I start to suspect that seek time and  
> transfer
> rates will suddenly start to matter a lot more than the difference  
> between
> indexed and keyed aggregate access.
>
> Accurate benchmarking is Not Easy.

Amen to that.

Below you'll find the output from a benchmarking program I wrote to  
test KinoSearch indexing speed.  I intentionally ran it cold, so that  
the first iteration wouldn't benefit from OS caching.  And indeed, it  
came up considerably slower: 2.46 seconds as opposed to the  
"truncated mean" of 1.41 seconds.

There's also another outlier of 1.89 seconds at the 8th iter.  Even  
though I quit almost everything before running this app, OS X is  
still a noisy operating system and every once in a while it hiccups.

The use of a "truncated mean" <http://en.wikipedia.org/wiki/ 
Truncated_mean> protects the stats from these glitches by discarding  
the outermost scores.  The same technique is commonly used in judged  
sports: the highest and lowest scores are tossed out, then the rest  
are averaged.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


slothbear:~/projects/ks_variants/ks_sortfix/perl marvin$ perl -I../ 
devel/benchmarks/indexers/ -Mblib ../devel/benchmarks/indexers/ 
kinosearch_indexer.plx --docs=1000 --reps=10
------------------------------------------------------------
1    Secs: 2.46  Docs: 1000
2    Secs: 1.40  Docs: 1000
3    Secs: 1.39  Docs: 1000
4    Secs: 1.41  Docs: 1000
5    Secs: 1.40  Docs: 1000
6    Secs: 1.41  Docs: 1000
7    Secs: 1.41  Docs: 1000
8    Secs: 1.89  Docs: 1000
9    Secs: 1.42  Docs: 1000
10   Secs: 1.39  Docs: 1000
------------------------------------------------------------
KinoSearch 0.20_03
Perl 5.8.6
Thread support: yes
Darwin 8.9.0 Power Macintosh
Mean: 1.46 secs
Truncated mean (6 kept, 4 discarded): 1.41 secs
------------------------------------------------------------
slothbear:~/projects/ks_variants/ks_sortfix/perl marvin$









More information about the Pdx-pm-list mailing list