[ABE.pm] Suggestions on manipulating CSV files?

Tue Dec 26 18:26:32 PST 2006

* "Faber J. Fedor" <faber at linuxnj.com> [2006-12-26T20:52:45]
> I've got three CSV files, two columns each.  The first colum in all
> three files is the CUSIP (read: unique ID) and are not in order.  Some
> files may not contain all of the CUSIPs.
> 
> I need to do something like this:
> 
>    foreach (CUSIP in primaryfile) { 
>        getvalue(CUSIP, primaryfile);
>        getvalue(CUSIP, secondaryfile);
>        getvalue(CUSIP, tertiaryfile);
>        crunch_numbers( $from, $all, $three_ files);
>        write_to_secondary_file() if($high_tide); 
>        write to primary_file()
>    }

The way that I'd solve this would depend on the nature of the crunching
algorithm.  I might want to have those three datasets distinct (as three
hashes, maybe), or merged into one hash with arrayrefs as values.

Pick the one that makes things easier for you to maintain and more efficient.

> I'm thinking or reading the files line by line into hashes, do the
> manipulations, delete the underlying files and recreate them.

...and of course keep in mind that you may want to do something like "write out
to a new file, then unlink the old one, then make a new link for that name to
the new one, then unlink the old name," just to be paranoid.

It sounds simple enough that it's not crazy to just do this.  A module may
exist to do it, but it might take more time to find than to implement.

Of course, DO use a CSV processing module, not split/join. ;)

-- 
rjbs