[DFW.pm] disk-read buffering?

Mon Dec 30 14:23:58 PST 2013

Joel Berger wrote:
> ...the first time I run my script it takes significantly
> longer than subsequent runs, soon afterwards.

Tommy Butler wrote:
> run this before you run your Perl app:
>
> find /dedup >/dev/null

That'll help cause the metadata to be cached, but not the file data.

> This is precisely the reason why we are taking the best time out of 2
> runs when each contestant's code is benchmarked.  I will run the above
> command every time before benchmarking any code to assure fairness.

I thought there was going to be a server reboot/reset/rebuild between runs.

The closest thing to real-world is having a fully empty cache, but I 
can't see any way that can be accomplished during development on a 
shared server.

The next best thing for consistent results (so you can do relative 
comparisons) would be seeding the cache, but using other tools will only 
approximate the needs of your dedupe code. Probably your best bet when 
testing is to just plan on running multiple times, and realize the the 
first run will more closely approximate the competition run in terms of 
actual time, and use subsequent runs for relative comparisons.

Ideally while testing you should be benchmarking small portions of your 
code, so the cache will fill on the first run, and you have a good 
chance they'll remain populated for several subsequent runs, despite 
other users on the system hitting other files.

  -Tom

-- 
Tom Metro
The Perl Shop, Newton, MA, USA
"Predictable On-demand Perl Consulting."
http://www.theperlshop.com/