<html>

  <head>


    <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <font face="Helvetica, Arial, sans-serif">It turns out that the

      "baseline" proof-of-concept code that I pushed to github yesterday

      is capable of doing a full run against the /dedup volume in less

      than 2 seconds.<br>

      <br>

      That doesn't make for much of a contest...<br>

      <br>

      So I spoke with another judge last night and decided to create a

      new volume of data to test against.  The new volume is called

      /dedup-more and it is much less easy to run against.  It's random

      data comes from the linux kernel, wikipedia, the perl source code,

      music files, video files, maildir files, cache files of all kinds,

      several git repos, and several other sources.  Random data was

      generated throughout the tree as well, and it has been peppered

      with hard and soft links.  With /dedup-more we've got something

      like 1 million files instead of 24 thousand... and many of the

      files are small files.  This is more close to a real world

      scenario than completely random data.<br>

      <br>

    </font><font face="Helvetica, Arial, sans-serif"><font

        face="Helvetica, Arial, sans-serif">In the interest of time, the

        final data volume for the actual contest does not have this many

        files.</font>  However /dedup-more should give you a better

      opportunity to accurately benchmark your own code.  As previously

      stated, please join the #dfwpm IRC channel on irc.perl.org if you

      are going to be running code, so we don't step on each other's

      toes.<br>

      <br>

      --Tommy Butler<br>

      <br>

      <br>

    </font>

  </body>

</html>