[pm-h] maildir - remove duplicate messages

Russell L. Harris rlharris at oplink.net
Sun Mar 28 13:31:52 PDT 2010


G. Wade Johnson wrote:
> I would probably take a multi-step approach. I would look for a module
> on CPAN that reads the maildir format (for example,
> Email::Folder::Maildir, which I found from search.cpan.org).
> 
> I would use that to match the To and From fields and remove any that I
> didn't want.
> 
> The best way to find duplicates is probably through the use of a
> message digest and a hash. Walk the messages, passing each through
> Digest::SHA1 or Digest::MD5 and use the result as the key to a hash.
> 
> If it already exists in the hash, delete the message. If not, add it to
> the hash.
> 
> Admittedly, that's just an outline of an approach, but it should get
> you started.
> 
> G. Wade

Thanks, Wade.  The term "digest" was unfamiliar to me, but I recognize 
the concept from the git documentation, as I am in the process of 
switching backup from svn to git.  And I was unaware of (or had 
forgotten about) search.cspan.org.

RLH


More information about the Houston mailing list