SPUG: filtering

Mike Schuh schuh at farmdale.com
Mon Aug 1 09:13:01 PDT 2005


>I have a bunch of data, harvested from incoming emails, and I'd like to
>periodically clean it up.  The format is pretty simple, with three fields
>tab delimited.
>user at domain.com <tab> Real Name <tab> Date Inserted
>Obviously, I get a bunch of duplicate data.  I've been able to come up some
>perl code to sort that data by the second field, and even delete duplicate
>data where the date is the same.  But I'd like to perform a filter where if
>(email & name) equal any other lines (email and name), then drop all other
>records which match except for the most current (denoted by the date).

Quick/dirty solution:

read existing database
 for each record, create key of(email & name)
 use this key to insert date into a hash
  (but only if this record is the one you want to keep)

do the same with the new data from your incoming data

write the hash out as your new (updated) database

Mike Schuh -- Seattle, Washington USA

More information about the spug-list mailing list