[DFW.pm] We've had a slight malfunction

Tommy Butler dfwpm at internetalias.net
Thu Jan 2 11:14:31 PST 2014


It turns out that the "baseline" proof-of-concept code that I pushed to
github yesterday is capable of doing a full run against the /dedup
volume in less than 2 seconds.

That doesn't make for much of a contest...

So I spoke with another judge last night and decided to create a new
volume of data to test against.  The new volume is called /dedup-more
and it is much less easy to run against.  It's random data comes from
the linux kernel, wikipedia, the perl source code, music files, video
files, maildir files, cache files of all kinds, several git repos, and
several other sources.  Random data was generated throughout the tree as
well, and it has been peppered with hard and soft links.  With
/dedup-more we've got something like 1 million files instead of 24
thousand... and many of the files are small files.  This is more close
to a real world scenario than completely random data.

In the interest of time, the final data volume for the actual contest
does not have this many files.  However /dedup-more should give you a
better opportunity to accurately benchmark your own code.  As previously
stated, please join the #dfwpm IRC channel on irc.perl.org if you are
going to be running code, so we don't step on each other's toes.

--Tommy Butler


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/dfw-pm/attachments/20140102/c82b4672/attachment.html>


More information about the Dfw-pm mailing list