SPUG: filtering
Mike Schuh
schuh at farmdale.com
Mon Aug 1 09:13:01 PDT 2005
Charles,
>I have a bunch of data, harvested from incoming emails, and I'd like to
>periodically clean it up. The format is pretty simple, with three fields
>tab delimited.
>
>user at domain.com <tab> Real Name <tab> Date Inserted
>
>Obviously, I get a bunch of duplicate data. I've been able to come up some
>perl code to sort that data by the second field, and even delete duplicate
>data where the date is the same. But I'd like to perform a filter where if
>(email & name) equal any other lines (email and name), then drop all other
>records which match except for the most current (denoted by the date).
Quick/dirty solution:
read existing database
for each record, create key of(email & name)
use this key to insert date into a hash
(but only if this record is the one you want to keep)
do the same with the new data from your incoming data
write the hash out as your new (updated) database
--
Mike Schuh -- Seattle, Washington USA
http://www.farmdale.com
More information about the spug-list
mailing list