LPM: Sorting Big Files

Rich Bowen rbowen at rcbowen.com
Wed Dec 15 09:30:01 CST 1999


Janine Ladick wrote:
> 
> Hello, List!
> 
> Does anyone have a smooth way of sorting large text files?  (By
> "large" I mean "in excess of 1GB" and by "text files" I mean "flat
> ASCII database files.")

Step 1: Get 2GB of RAM
Step 2: Read the file into an array and call sort() on it.
Step 3: Go read a good book

OK, all kidding aside ...

Uri Guttman and Larry Rosler presented a paper at the Perl Conference
about sorting. There is nothing in that paper about huge text files, but
they might have some insight into this.

I know that there are sorts that don't require that you have the whole
data set in memory at a time I suspect that they are slower than (insert
favorite slow thing) but I guess that sorting a gig of data is going to
be slow no matter what.

I think that my first step would be to run through the data file and
insert each entry into a real database, and then select it back out of
the database in sorted order. My suspicion is that that would be the
most efficient thing to do, even if you don't use the datase for storing
the data. A million records is not very many, when talking about a real
database, but it's a boatload when you're just trying to deal with text
files.

Rich
-- 
http://www.ApacheUnleashed.com/
Lexington Perl Mongers - http://lexington.pm.org/
PGP Key - http://www.rcbowen.com/pgp.txt



More information about the Lexington-pm mailing list