SPUG: Sorting a big file

Dan Ebert mathin at mathin.com
Fri Aug 27 16:00:18 CDT 2004


I had thought of spliting the file, but I don't think this would work
if a later section had lines that really should be at the top of the sort.

SPLIT DATA:
12
14
16
13

34
54
21
10

would create a file:

12
13
14
16
10
21
34
54

which really isn't sorted.

It looks like the UNIX sort command is working on the whole file though.
I didn't know that command.  Thanks to everyone who pointed it out to me.

Dan.
----------------------------------------------------------
Immigration is the sincerest form of flattery.
	- Unknown
----------------------------------------------------------


On Fri, 27 Aug 2004, Brian Hatch wrote:

>
>
> >
> > I have a large file (~1 million lines, ~142MB) which I need to sort (any
> > order is fine, just so the lines are in a repeatable order).
> >
> > Just using perl's 'sort' on the file read into an array eats up all the
> > RAM and swap on my box and crashes.  I'm also trying tying the file as an
> > array, but it looks like that is also going to use up all the memory. Does
> > anyone know some other methods I could try?
>
> #!/bin/sh
> FILE=whatever
>
> split $FILE section.
> for file in section.*
> do
> 	sort $file > $file.sorted
> done
> sort -m section.*.sorted > sorted-version
>
> rm section.*
>
>
> --
> Brian Hatch                  "Londo, do you know where you are?"
>    Systems and               "Either in Medlab, or in Hell.
>    Security Engineer          Either way, the decor needs work."
> http://www.ifokr.org/bri/
>
> Every message PGP signed
>



More information about the spug-list mailing list