[DFW.pm] what is a hard link, and what should my deduper do with them?

Tommy Butler dfwpm at internetalias.net
Sat Dec 28 19:37:09 PST 2013


What is a hard link? --> http://www.linfo.org/hard_link.html

Because of the nature of hard links, there's no way to know which hard
link existed first or which one is to be considered the "original" file,
because they point to the same underlying storage which has only one
lastmod/atime/mtime timestamp set.

As such, the official rule on the matter is that the
asciibetically-first fully-qualified file name will be considered the
original, while the other hard links should be considered, as already
stated in the rules, "files already deduped".  The reason for this is
strictly for output and reporting consistencies.  (We need this to
maintain a standard baseline output format, which I'll set forth in
another email coming soon).


      SCENARIO:

The three files below have identical content:
/foo/bar/baz.txt -> ( inode 12345 )
/foo/car/daz.txt -> ( inode 12345 )
/foo/far/gaz.txt -> ( inode 67890 )


      OUTCOME:

/foo/far/gaz.txt should be reported as a duplicate of /foo/bar/baz.txt
because /foo/bar/baz.txt comes before /foo/car/daz.txt in a sort and
because /foo/car/daz.txt is a hard link.


      CODE:

I'm doing it like this.  This is just an unoptimized example.  TIMTOWTDI.

    # this will automatically throw out all but one hardlink, with the
    only surviving
    # file name being the first asciibetically-sorted entry

    $dev_inodes{ join '', ( stat $_ )[0,1] } = $_
       for reverse sort @group_of_same_size_files_by_name;

    next if scalar keys %dev_inodes == 1; # don't keep working if
    there's nothing to compare

    for my file ( values %dev_inodes )
    {
       do stuff to figure out which of the same-size files are duplicates...
    }


--Tommy Butler
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/dfw-pm/attachments/20131228/dd90463c/attachment.html>


More information about the Dfw-pm mailing list