LPM: RE: Sorting Big Files

Rietz, Ken ken.rietz at asbury.edu
Wed Dec 15 11:55:15 CST 1999


> Hello, List!
> 
> Does anyone have a smooth way of sorting large text files?  (By 
> "large" I mean "in excess of 1GB" and by "text files" I mean "flat 
> ASCII database files.")
> 
> Here's the task:  I have a text database of a million or so records.  
> Each record is about 1300 bytes long.  I need to change record 
> format from fixed-field to delimited, sort the records in 
> country name 
> order, then output in 10,000 record pieces to sequentially numbered 
> text files.  I already have routines to change format and output the 
> file in pieces; it's the sorting part that has me wondering.
> 
> Any suggestions are much appreciated.

Here's my suggestion, similar to many others, but guaranteed to be
linear time:

1. Read each record, and append it (or some key to the record) to a
file whose name is the name of the country. (Create the file if it
doesn't exist already and that sort of jazz.)
2. Sort the filenames.
3. Print out your records from the filename-ordered files.

As long as you don't require any order within countries, this is about
as easy as anything I could come up with. Even if you do, you can operate
within a country by applying the same operation to each file.



More information about the Lexington-pm mailing list