SPUG: Sorting a big file

Jay Scherrer jay at scherrer.com
Fri Aug 27 23:37:33 CDT 2004


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

What would that look like if you threaded three files.

jay
 
On Friday 27 August 2004 02:33 pm, Kurt Buff wrote:
> When you do the merge, you compare across the various files.
>
> If, for instance, you break it into 3 files a, b, c, you'll sort each file,
> take a line from a, compare it to the next line for b and c, and write the
> correct one to the result file.
>
> lather, rinse, repeat.
>
> -----Original Message-----
> From: spug-list-bounces at mail.pm.org
> [mailto:spug-list-bounces at mail.pm.org]On Behalf Of Dan Ebert
> Sent: Friday, August 27, 2004 14:00
> Cc: spug-list at mail.pm.org
> Subject: Re: SPUG: Sorting a big file
>
>
>
> I had thought of spliting the file, but I don't think this would work
> if a later section had lines that really should be at the top of the sort.
>
> SPLIT DATA:
> 12
> 14
> 16
> 13
>
> 34
> 54
> 21
> 10
>
> would create a file:
>
> 12
> 13
> 14
> 16
> 10
> 21
> 34
> 54
>
> which really isn't sorted.
>
> It looks like the UNIX sort command is working on the whole file though.
> I didn't know that command.  Thanks to everyone who pointed it out to me.
>
> Dan.
> ----------------------------------------------------------
> Immigration is the sincerest form of flattery.
> 	- Unknown
> ----------------------------------------------------------
>
> On Fri, 27 Aug 2004, Brian Hatch wrote:
> > > I have a large file (~1 million lines, ~142MB) which I need to sort
> > > (any order is fine, just so the lines are in a repeatable order).
> > >
> > > Just using perl's 'sort' on the file read into an array eats up all the
> > > RAM and swap on my box and crashes.  I'm also trying tying the file as
>
> an
>
> > > array, but it looks like that is also going to use up all the memory.
>
> Does
>
> > > anyone know some other methods I could try?
> >
> > #!/bin/sh
> > FILE=whatever
> >
> > split $FILE section.
> > for file in section.*
> > do
> > 	sort $file > $file.sorted
> > done
> > sort -m section.*.sorted > sorted-version
> >
> > rm section.*
> >
> >
> > --
> > Brian Hatch                  "Londo, do you know where you are?"
> >    Systems and               "Either in Medlab, or in Hell.
> >    Security Engineer          Either way, the decor needs work."
> > http://www.ifokr.org/bri/
> >
> > Every message PGP signed
>
> _____________________________________________________________
> Seattle Perl Users Group Mailing List
> POST TO: spug-list at mail.pm.org  http://spugwiki.perlocity.org
> ACCOUNT CONFIG: http://mail.pm.org/mailman/listinfo/spug-list
> MEETINGS: 3rd Tuesdays, Location Unknown
> WEB PAGE: http://www.seattleperl.org
>
>
>
>
> _____________________________________________________________
> Seattle Perl Users Group Mailing List
> POST TO: spug-list at mail.pm.org  http://spugwiki.perlocity.org
> ACCOUNT CONFIG: http://mail.pm.org/mailman/listinfo/spug-list
> MEETINGS: 3rd Tuesdays, Location Unknown
> WEB PAGE: http://www.seattleperl.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)

iD8DBQFBMAwN7+UFWg+1k3YRAiJvAJ9k14AHrSATl2cTKsMpnqMe653lKgCfbjUD
QCWoWUYO+aNLcPDbdnZzHSg=
=T7mN
-----END PGP SIGNATURE-----




More information about the spug-list mailing list