<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<font face="Helvetica, Arial, sans-serif">It turns out that the
"baseline" proof-of-concept code that I pushed to github yesterday
is capable of doing a full run against the /dedup volume in less
than 2 seconds.<br>
<br>
That doesn't make for much of a contest...<br>
<br>
So I spoke with another judge last night and decided to create a
new volume of data to test against. The new volume is called
/dedup-more and it is much less easy to run against. It's random
data comes from the linux kernel, wikipedia, the perl source code,
music files, video files, maildir files, cache files of all kinds,
several git repos, and several other sources. Random data was
generated throughout the tree as well, and it has been peppered
with hard and soft links. With /dedup-more we've got something
like 1 million files instead of 24 thousand... and many of the
files are small files. This is more close to a real world
scenario than completely random data.<br>
<br>
</font><font face="Helvetica, Arial, sans-serif"><font
face="Helvetica, Arial, sans-serif">In the interest of time, the
final data volume for the actual contest does not have this many
files.</font> However /dedup-more should give you a better
opportunity to accurately benchmark your own code. As previously
stated, please join the #dfwpm IRC channel on irc.perl.org if you
are going to be running code, so we don't step on each other's
toes.<br>
<br>
--Tommy Butler<br>
<br>
<br>
</font>
</body>
</html>