LPM: Sorting Big Files

John Soward soward at uky.edu
Wed Dec 15 10:14:18 CST 1999


Janine Ladick wrote:
> 
> Hello, List!
> 
> Does anyone have a smooth way of sorting large text files?  (By
> "large" I mean "in excess of 1GB" and by "text files" I mean "flat
> ASCII database files.")
> 
> Here's the task:  I have a text database of a million or so records.
> Each record is about 1300 bytes long.  I need to change record
> format from fixed-field to delimited, sort the records in country name
> order, then output in 10,000 record pieces to sequentially numbered
> text files.  I already have routines to change format and output the
> file in pieces; it's the sorting part that has me wondering.
> 
	There are lots of ways, several of which have been addresses. If this
is a UNIX or other system where you have access to GNU sort from the
command line, once you've made the jump from fixed to delimited, you can
use GNU sort and it _should_ handle the file size reverting to a
file-based sort. Alternatively you could attempt one of the 'bucket'
type approaches mentioned, breaking the task down by the 'country'
field. You could build an array of line numbers which match each country
and sort from there, etc...
	Rich's suggestion of inserting into a database might be the most
efficient overall though, since a mis-step in the coding of another
approach might cause you to have to make several run-throughs and
manually check for accuracy. MySQL should be able to handle a single
table of this size and return the results ordered by country easily. 
	Barring all that I do have some machines here with several Gig of
RAM...

-- 
      John Soward        University of Kentucky Technical Services
e:soward at uky.edu p:(606)257-2900 f:(606)323-1978 w:
http://neworder.cc.uky.edu/



More information about the Lexington-pm mailing list