[From nobody Mon Aug  2 21:31:10 2004
Received: from onion.perl.org (onion.develooper.com [63.251.223.166]) by
	tipjar.com id h9NCuilw086111; Thu, 23 Oct 2003 06:56:44 -0600 (MDT)
X-Received-From: perl5-porters-return-84002-p5p_subscription=davidnicol.com@perl.org
X-Received-For: &lt;p5p_subscription@davidnicol.com&gt;
Received: (qmail 99082 invoked by uid 1005); 23 Oct 2003 12:56:38 -0000
Mailing-List: contact perl5-porters-help@perl.org; run by ezmlm
Precedence: bulk
list-help: &lt;mailto:perl5-porters-help@perl.org&gt;
list-unsubscribe: &lt;mailto:perl5-porters-unsubscribe@perl.org&gt;
list-post: &lt;mailto:perl5-porters@perl.org&gt;
X-List-Archive: &lt;http://nntp.perl.org/group/perl.perl5.porters/84002&gt;
Delivered-To: mailing list perl5-porters@perl.org
Received: (qmail 99055 invoked by uid 76); 23 Oct 2003 12:56:38 -0000
Delivered-To: perl5-porters@perl.org
Date: Thu, 23 Oct 2003 14:55:59 +0200
From: Abigail &lt;abigail@abigail.nl&gt;
To: Uri Guttman &lt;uri@stemsystems.com&gt;
Cc: perl5-porters@perl.org
Subject: Re: new slurp module
Message-ID: &lt;20031023125559.GA18577@abigail.nl&gt;
References: &lt;0DE0A108-0535-11D8-8408-000A95A2734C@nanisky.com&gt;
	&lt;x7d6coe3fm.fsf@mail.sysarch.com&gt; &lt;20031023093433.GA19101@abigail.nl&gt;
	&lt;x73cdke19a.fsf@mail.sysarch.com&gt;
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: &lt;x73cdke19a.fsf@mail.sysarch.com&gt;
User-Agent: Mutt/1.3.28i
X-Spam-Check-By: one.develooper.com
X-Spam-Status: No, hits=-3.0 required=7.0
	tests=CARRIAGE_RETURNS, IN_REP_TO, QUOTED_EMAIL_TEXT, REFERENCES,
	SPAM_PHRASE_01_02, USER_AGENT, USER_AGENT_MUTT version=2.44
X-SMTPD: qpsmtpd/0.26, http://develooper.com/code/qpsmtpd/
X-UIDL: cP[!!Z&amp;X!!50d!!WIg&quot;!
Content-Transfer-Encoding: 7bit

On Thu, Oct 23, 2003 at 05:45:05AM -0400, Uri Guttman wrote:
&gt; &gt;&gt;&gt;&gt;&gt; &quot;A&quot; == Abigail  &lt;abigail@abigail.nl&gt; writes:
&gt; 
&gt;   A&gt; What's ugly or not is very subjective. The speed argument doesn't
&gt;   A&gt; quite convince me. How often do programs slurp in lots of files?
&gt; 
&gt; template systems, config files, language source, etc. many types of
&gt; files are slurped and sometimes lots of them. as i write in the article,
&gt; slurping and then munging/parsing/whatever on the whole file can be much
&gt; faster than classic line by line. so the speed does matter and why not
&gt; make it as fast as possible since it is a module that could be called
&gt; often.

I assume you were comparing File::Slurps methods with other methods
that slurp in the whole file. And as the following benchmark suggests
are the other idioms to slurp in an entire file faster (with the exception
of using `cat`):

Running test with 1 bytes
           Rate     cat   slurp      do    open sysread
cat       745/s      --    -97%    -98%    -98%    -99%
slurp   24139/s   3138%      --    -32%    -32%    -62%
do      35275/s   4632%     46%      --     -1%    -45%
open    35759/s   4697%     48%      1%      --    -44%
sysread 64056/s   8493%    165%     82%     79%      --

Running test with 10 bytes
           Rate     cat   slurp      do    open sysread
cat       623/s      --    -97%    -98%    -98%    -99%
slurp   23420/s   3657%      --    -33%    -35%    -62%
do      34825/s   5486%     49%      --     -3%    -44%
open    35896/s   5658%     53%      3%      --    -42%
sysread 62029/s   9850%    165%     78%     73%      --

Running test with 100 bytes
           Rate     cat   slurp      do    open sysread
cat      1115/s      --    -95%    -97%    -97%    -98%
slurp   23332/s   1992%      --    -33%    -34%    -62%
do      34966/s   3035%     50%      --     -1%    -42%
open    35360/s   3070%     52%      1%      --    -42%
sysread 60755/s   5347%    160%     74%     72%      --

Running test with 1000 bytes
           Rate     cat   slurp      do    open sysread
cat      1627/s      --    -93%    -95%    -95%    -97%
slurp   23354/s   1336%      --    -30%    -32%    -61%
do      33334/s   1949%     43%      --     -3%    -45%
open    34494/s   2021%     48%      3%      --    -43%
sysread 60625/s   3627%    160%     82%     76%      --

Running test with 10000 bytes
           Rate     cat   slurp      do    open sysread
cat      1013/s      --    -95%    -96%    -96%    -98%
slurp   20773/s   1950%      --    -22%    -28%    -62%
do      26700/s   2535%     29%      --     -7%    -51%
open    28789/s   2741%     39%      8%      --    -48%
sysread 54929/s   5320%    164%    106%     91%      --

Running test with 1000000 bytes
         Rate     cat   slurp      do    open sysread
cat     175/s      --    -10%    -19%    -59%    -59%
slurp   195/s     12%      --    -10%    -55%    -55%
do      217/s     24%     11%      --    -49%    -50%
open    430/s    146%    120%     98%      --     -0%
sysread 430/s    146%    121%     98%      0%      --

Running test with 10000000 bytes
          Rate     cat   slurp      do sysread    open
cat     20.2/s      --     -3%    -11%    -55%    -56%
slurp   20.7/s      3%      --     -9%    -54%    -54%
do      22.8/s     13%     10%      --    -50%    -50%
sysread 45.2/s    124%    118%     99%      --     -0%
open    45.5/s    125%    119%    100%      0%      --


This is the program that I used to create the figures above:

#!/usr/bin/perl

use strict;
use warnings;
use Benchmark qw /cmpthese/;
use File::Slurp;

# Prepare some files.
my  @sizes = (1, 10, 100, 1_000, 10_000, 1_000_000, 10_000_000);
my  $base  = &quot;/tmp/data&quot;;
foreach my $size (@sizes) {
    my $file = &quot;$base.$size&quot;;
    open my $fh =&gt; &quot;&gt; $file&quot; or die;
    print $fh &quot; &quot; x $size;
    close $fh or die;
}

foreach my $size (@sizes) {
    our ($r1, $r2, $r3, $r4, $r5);
    our $file = &quot;$base.$size&quot;;
    print &quot;Running test with $size bytes\n&quot;;

    cmpthese -10 =&gt; {
        slurp   =&gt;  '$::r1 = read_file $::file;',
        do      =&gt;  '$::r2 = do {local (@ARGV, $/) = $::file; &lt;&gt;};',
        open    =&gt;  'open my $fh =&gt; $::file or die; undef $/; $::r3 = &lt;$fh&gt;;',
        sysread =&gt;  'open my $fh =&gt; $::file or die;
                     sysread $fh =&gt; $::r4, -s $::file;',
        cat     =&gt;  '$::r5 = `cat $::file`;',
    };

    die '$r1 ne $r2' if $r1 ne $r2;
    die '$r2 ne $r3' if $r2 ne $r3;
    die '$r3 ne $r4' if $r3 ne $r4;
    die '$r4 ne $r5' if $r4 ne $r5;
    die 'Wrong size' if length ($r1) != $size;

    print &quot;\n&quot;; 
}

END {unlink map {&quot;$base.$_&quot;} @sizes}

__END__


Frankly, I don't see much reason for a File::Slurp addition to the
core. The current idioms to slurp in a whole file are small (in chars),
and faster than File::Slurp. Granted, File::Slurp has a write_file
method, but I think reading entire files at once is much more common
than writing them.

&gt; the reason to put it in core is to have a standard (not just cpan)
&gt; module to do slurping. it simplifies the operation, makes it more
&gt; maintainable (no multiple idioms to remember) and is faster. many other
&gt; modules are in core with less than that.

You don't have to remember multiple idioms, one is enough. As you can
see, the current idioms are short enough to be not significant harder
to remember than 'use File::Slurp; $text = read_file $file'. Unless you
mean you have to remember multiple idioms because you are a maintainer
and you have to maintain code written by different people, using different
idioms. In that case, the last thing you want is to have to remember yet
another idiom (the current idioms don't go away).


Abigail

]