[DFW.pm] what is a hard link, and what should my deduper do with them?
Tom Metro
tmetro+dfw-pm at gmail.com
Mon Dec 30 14:28:56 PST 2013
Tommy Butler wrote:
> ...other hard links should be considered, as already
> stated in the rules, "files already deduped".
>
> SCENARIO:
>
> The three files below have identical content:
> /foo/bar/baz.txt -> ( inode 12345 )
> /foo/car/daz.txt -> ( inode 12345 )
> /foo/far/gaz.txt -> ( inode 67890 )
>
>
> OUTCOME:
>
> /foo/far/gaz.txt should be reported as a duplicate of /foo/bar/baz.txt
> because /foo/bar/baz.txt comes before /foo/car/daz.txt in a sort and
> because /foo/car/daz.txt is a hard link.
So then the output might look like:
/foo/bar/baz.txt /foo/far/gaz.txt
while /foo/car/daz.txt is simply eliminated from consideration and not
output at all?
The problem with this approach, if you are striving for a useful tool
and not just a programming exercise, is that you don't know which of the
aliases is the name most familiar to the user who will be reviewing the
report.
Another possibility might be to report hardlinks in a way that visually
groups them together, then any place one member of a hardlink would
appear in the output, you replace it with the group:
(/foo/bar/baz.txt /foo/car/daz.txt) /foo/far/gaz.txt
(With members of the group being sub-sorted asciibetically, and the
first member of the group being used as the key when sorting the overall
list of duplicates.)
But this is still not quite ideal. This implies that you ignore
collections of hardlinks that don't also have a duplicate file. Chances
are good if the user is interested in duplicates, they're also
interested to know about what hardlinks (aliases) exist.
Plus, most characters you choose for grouping could potentially be part
of the file name, although the same could be said for the space delimiters.
So instead, you could simply produce a report of hardlinks at the end,
and any place a file appears in a duplicate report that has multiple
aliases, you always show the asciibetically first name:
Duplicates:
/foo/bar/baz.txt /foo/far/gaz.txt
...
Aliases:
/foo/bar/baz.txt /foo/car/daz.txt
...
-Tom
--
Tom Metro
The Perl Shop, Newton, MA, USA
"Predictable On-demand Perl Consulting."
http://www.theperlshop.com/
More information about the Dfw-pm
mailing list