[ABE.pm] Suggestions on manipulating CSV files?

Faber J. Fedor faber at linuxnj.com
Tue Dec 26 17:52:45 PST 2006


This is sort of a continuation of "Tie::Hashing to files" from back in Oct
(http://mail.pm.org/pipermail/abe-pm/2006-October/000602.html).

Not only did I find another bug in FlatFile.pm (but that's for another
post), I don't think it's suitable for what I need.

I've got three CSV files, two columns each.  The first colum in all
three files is the CUSIP (read: unique ID) and are not in order.  Some
files may not contain all of the CUSIPs.

I need to do something like this:

   foreach (CUSIP in primaryfile) { 
       getvalue(CUSIP, primaryfile);
       getvalue(CUSIP, secondaryfile);
       getvalue(CUSIP, tertiaryfile);
       crunch_numbers( $from, $all, $three_ files);
       write_to_secondary_file() if($high_tide); 
       write to primary_file()
   }

FlatFile can do it, but not very quickly; looking up the CUSIP in each
file takes four seconds per lookup = 12 seconds per loop.  Since I have
3000 CUSIPS to process, well, you do the math.

I'm thinking or reading the files line by line into hashes, do the
manipulations, delete the underlying files and recreate them.

Seems like there should be a better way.  Any ideas?


-- 
 
Regards,
 
Faber Fedor
President
Linux New Jersey, Inc.
908-320-0357
800-706-0701

http://www.linuxnj.com





More information about the ABE-pm mailing list