SPUG: Fw: greetings

Tim Maher/CONSULTIX tim at consultix-inc.com
Wed Oct 17 21:07:30 CDT 2001


On Wed, Oct 17, 2001 at 08:14:40PM -0500, Russell Miller wrote:
> First thing and I screw up.  The message is below.  Thanks for your help.

Unless you're going to sort the lines of the file, there's probably no
good reason to load the whole thing into memory!

See further comments below . ..

> From: "Russell Miller" <duskglow2000 at yahoo.com>
> To: <spug-list at perl.org>
> Sent: Wednesday, October 17, 2001 8:13 PM
> Subject: greetings
> 
> 
> > Greetings all...  just joined the list, because I'm a professional perl
> > programmer, and I have a problem I just can't seem to find the answer to,
> > and I hope that you can help me out.  I'll try to help you guys (gals?)
> out
> > where I can, too.
> >
> > ok, we've got a six million line file to read.  I had written a program
> that
> > scaled just fine for smaller files, but it choked on this file, and took
> an
> > extreme amount of time.
> >
> > the original code goes like this:
> >
> > open (FILE, "<$INPUT");
> > @array = <FILE>;

That's an EXPENSIVE operation for a big file; Perl's got to
allocate a scalar variable for each line, and stuff it into memory.
Not to mention that Perl will have to keep allocating more memory
for the array, as the lines keep coming in.

> > close FILE;
> >
> > foreach $k (@array) {
> >     ...
> > }
> >
> > it worked just fine, but very slow and memory intensive for a six million
> > line file.  So, we changed it to read sequentially:
> >
> > open (FILE, "<$input");
> > foreach $k (<FILE>) {

That still causes Perl to read the entire file into  memory, but
at least you're not storing the entire file in an array this time,
before looking at each line individually.

> >     ...
> > }
> >
> > which got rid of the memory problem but the speed issue was still there.
> So
> > I changed the foreach line to:
> >
> > while ($k = <FILE>) {
> >
> > and the increase of speed had to be a thousand fold.
> >
> > What causes that speed impact?
> > Thanks.
> >
> > --Russell

While provides a SCALAR context to the input operator ( <> ),
so only one line is read and stored in memory at a time.
Foreach provides a LIST context, so the entire file is first
read into memory, and then doled out to you one line at a time
in the loop variable (doh!)

Incidentally, I'm teaching a basic Perl programming class the
week of 12/3 in Kirkland, if you want to learn more! 8-}

 
*=========================================================================*
| Dr. Tim Maher, CEO, Consultix        (206) 781-UNIX/8649;  ask for FAX# |
| EMAIL: tim at consultix-inc.com         WEB: http://www.consultix-inc.com  |
| TIM MAHER: UNIX/Perl  DAMIAN CONWAY: OO Perl  COLIN MEYER: Perl CGI/DBI |
|CLASSES:Int Perl 10/22; UNIX 11/26; Minimal Perl 11/30; Perl+Modules 12/3|
| /etc/cotd:  find /earth -follow -name bin-laden -print | xargs rm -rf   |
*=========================================================================*

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
     POST TO: spug-list at pm.org       PROBLEMS: owner-spug-list at pm.org
      Subscriptions; Email to majordomo at pm.org:  ACTION  LIST  EMAIL
  Replace ACTION by subscribe or unsubscribe, EMAIL by your Email-address
 For daily traffic, use spug-list for LIST ;  for weekly, spug-list-digest
     Seattle Perl Users Group (SPUG) Home Page: http://zipcon.net/spug/





More information about the spug-list mailing list