<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<font face="Helvetica, Arial, sans-serif">We've had a lot of recent
list sign-ups lately in relation to the hackathon, so this is just
a reminder of things already discussed in our last meeting which
you may have missed.<br>
</font>
<ul>
<li><font face="Helvetica, Arial, sans-serif">You are deduping an
ext4 filesystem and that's all we're saying about it. You
have server access so if you want to poke it, you can ;-)</font><br>
</li>
<li><i><b><font face="Helvetica, Arial, sans-serif">The test data
you are working with is peppered randomly with hard and
soft links.</font></b></i></li>
<li><font face="Helvetica, Arial, sans-serif">Hard links to files
are not duplicates. They point to the same underlying storage
and should therefore be considered as files already de-duped.
We are aware that technically hard links _are_ files in and of
themselves, but they are metadata and not storage. You'll
have to decide how to optimize for this. It's a very tricky
tradeoff. Don't base your decision on how frequently links
occur in the test dataset; the final dataset will not be
identical and is not even guaranteed to be similar.<br>
</font></li>
<li><font face="Helvetica, Arial, sans-serif">Symlinks are also
NOT duplicates.</font></li>
<li><font face="Helvetica, Arial, sans-serif">Your code is indeed
going to face <i>directory</i> symlinks as well as file
links, so you'll need to take care not to get stuck in
directory recursion loops.<br>
</font></li>
</ul>
<font face="Helvetica, Arial, sans-serif">-- <br>
Tommy Butler, John Fields<br>
<br>
<br>
</font>
</body>
</html>