[pm-h] maildir - remove duplicate messages
Russell L. Harris
rlharris at oplink.net
Sun Mar 28 13:31:52 PDT 2010
G. Wade Johnson wrote:
> I would probably take a multi-step approach. I would look for a module
> on CPAN that reads the maildir format (for example,
> Email::Folder::Maildir, which I found from search.cpan.org).
>
> I would use that to match the To and From fields and remove any that I
> didn't want.
>
> The best way to find duplicates is probably through the use of a
> message digest and a hash. Walk the messages, passing each through
> Digest::SHA1 or Digest::MD5 and use the result as the key to a hash.
>
> If it already exists in the hash, delete the message. If not, add it to
> the hash.
>
> Admittedly, that's just an outline of an approach, but it should get
> you started.
>
> G. Wade
Thanks, Wade. The term "digest" was unfamiliar to me, but I recognize
the concept from the git documentation, as I am in the process of
switching backup from svn to git. And I was unaware of (or had
forgotten about) search.cspan.org.
RLH
More information about the Houston
mailing list