[From nobody Mon Aug 2 21:31:10 2004 Received: from onion.perl.org (onion.develooper.com [63.251.223.166]) by tipjar.com id h9NCuilw086111; Thu, 23 Oct 2003 06:56:44 -0600 (MDT) X-Received-From: perl5-porters-return-84002-p5p_subscription=davidnicol.com@perl.org X-Received-For: <p5p_subscription@davidnicol.com> Received: (qmail 99082 invoked by uid 1005); 23 Oct 2003 12:56:38 -0000 Mailing-List: contact perl5-porters-help@perl.org; run by ezmlm Precedence: bulk list-help: <mailto:perl5-porters-help@perl.org> list-unsubscribe: <mailto:perl5-porters-unsubscribe@perl.org> list-post: <mailto:perl5-porters@perl.org> X-List-Archive: <http://nntp.perl.org/group/perl.perl5.porters/84002> Delivered-To: mailing list perl5-porters@perl.org Received: (qmail 99055 invoked by uid 76); 23 Oct 2003 12:56:38 -0000 Delivered-To: perl5-porters@perl.org Date: Thu, 23 Oct 2003 14:55:59 +0200 From: Abigail <abigail@abigail.nl> To: Uri Guttman <uri@stemsystems.com> Cc: perl5-porters@perl.org Subject: Re: new slurp module Message-ID: <20031023125559.GA18577@abigail.nl> References: <0DE0A108-0535-11D8-8408-000A95A2734C@nanisky.com> <x7d6coe3fm.fsf@mail.sysarch.com> <20031023093433.GA19101@abigail.nl> <x73cdke19a.fsf@mail.sysarch.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <x73cdke19a.fsf@mail.sysarch.com> User-Agent: Mutt/1.3.28i X-Spam-Check-By: one.develooper.com X-Spam-Status: No, hits=-3.0 required=7.0 tests=CARRIAGE_RETURNS, IN_REP_TO, QUOTED_EMAIL_TEXT, REFERENCES, SPAM_PHRASE_01_02, USER_AGENT, USER_AGENT_MUTT version=2.44 X-SMTPD: qpsmtpd/0.26, http://develooper.com/code/qpsmtpd/ X-UIDL: cP[!!Z&X!!50d!!WIg"! Content-Transfer-Encoding: 7bit On Thu, Oct 23, 2003 at 05:45:05AM -0400, Uri Guttman wrote: > >>>>> "A" == Abigail <abigail@abigail.nl> writes: > > A> What's ugly or not is very subjective. The speed argument doesn't > A> quite convince me. How often do programs slurp in lots of files? > > template systems, config files, language source, etc. many types of > files are slurped and sometimes lots of them. as i write in the article, > slurping and then munging/parsing/whatever on the whole file can be much > faster than classic line by line. so the speed does matter and why not > make it as fast as possible since it is a module that could be called > often. I assume you were comparing File::Slurps methods with other methods that slurp in the whole file. And as the following benchmark suggests are the other idioms to slurp in an entire file faster (with the exception of using `cat`): Running test with 1 bytes Rate cat slurp do open sysread cat 745/s -- -97% -98% -98% -99% slurp 24139/s 3138% -- -32% -32% -62% do 35275/s 4632% 46% -- -1% -45% open 35759/s 4697% 48% 1% -- -44% sysread 64056/s 8493% 165% 82% 79% -- Running test with 10 bytes Rate cat slurp do open sysread cat 623/s -- -97% -98% -98% -99% slurp 23420/s 3657% -- -33% -35% -62% do 34825/s 5486% 49% -- -3% -44% open 35896/s 5658% 53% 3% -- -42% sysread 62029/s 9850% 165% 78% 73% -- Running test with 100 bytes Rate cat slurp do open sysread cat 1115/s -- -95% -97% -97% -98% slurp 23332/s 1992% -- -33% -34% -62% do 34966/s 3035% 50% -- -1% -42% open 35360/s 3070% 52% 1% -- -42% sysread 60755/s 5347% 160% 74% 72% -- Running test with 1000 bytes Rate cat slurp do open sysread cat 1627/s -- -93% -95% -95% -97% slurp 23354/s 1336% -- -30% -32% -61% do 33334/s 1949% 43% -- -3% -45% open 34494/s 2021% 48% 3% -- -43% sysread 60625/s 3627% 160% 82% 76% -- Running test with 10000 bytes Rate cat slurp do open sysread cat 1013/s -- -95% -96% -96% -98% slurp 20773/s 1950% -- -22% -28% -62% do 26700/s 2535% 29% -- -7% -51% open 28789/s 2741% 39% 8% -- -48% sysread 54929/s 5320% 164% 106% 91% -- Running test with 1000000 bytes Rate cat slurp do open sysread cat 175/s -- -10% -19% -59% -59% slurp 195/s 12% -- -10% -55% -55% do 217/s 24% 11% -- -49% -50% open 430/s 146% 120% 98% -- -0% sysread 430/s 146% 121% 98% 0% -- Running test with 10000000 bytes Rate cat slurp do sysread open cat 20.2/s -- -3% -11% -55% -56% slurp 20.7/s 3% -- -9% -54% -54% do 22.8/s 13% 10% -- -50% -50% sysread 45.2/s 124% 118% 99% -- -0% open 45.5/s 125% 119% 100% 0% -- This is the program that I used to create the figures above: #!/usr/bin/perl use strict; use warnings; use Benchmark qw /cmpthese/; use File::Slurp; # Prepare some files. my @sizes = (1, 10, 100, 1_000, 10_000, 1_000_000, 10_000_000); my $base = "/tmp/data"; foreach my $size (@sizes) { my $file = "$base.$size"; open my $fh => "> $file" or die; print $fh " " x $size; close $fh or die; } foreach my $size (@sizes) { our ($r1, $r2, $r3, $r4, $r5); our $file = "$base.$size"; print "Running test with $size bytes\n"; cmpthese -10 => { slurp => '$::r1 = read_file $::file;', do => '$::r2 = do {local (@ARGV, $/) = $::file; <>};', open => 'open my $fh => $::file or die; undef $/; $::r3 = <$fh>;', sysread => 'open my $fh => $::file or die; sysread $fh => $::r4, -s $::file;', cat => '$::r5 = `cat $::file`;', }; die '$r1 ne $r2' if $r1 ne $r2; die '$r2 ne $r3' if $r2 ne $r3; die '$r3 ne $r4' if $r3 ne $r4; die '$r4 ne $r5' if $r4 ne $r5; die 'Wrong size' if length ($r1) != $size; print "\n"; } END {unlink map {"$base.$_"} @sizes} __END__ Frankly, I don't see much reason for a File::Slurp addition to the core. The current idioms to slurp in a whole file are small (in chars), and faster than File::Slurp. Granted, File::Slurp has a write_file method, but I think reading entire files at once is much more common than writing them. > the reason to put it in core is to have a standard (not just cpan) > module to do slurping. it simplifies the operation, makes it more > maintainable (no multiple idioms to remember) and is faster. many other > modules are in core with less than that. You don't have to remember multiple idioms, one is enough. As you can see, the current idioms are short enough to be not significant harder to remember than 'use File::Slurp; $text = read_file $file'. Unless you mean you have to remember multiple idioms because you are a maintainer and you have to maintain code written by different people, using different idioms. In that case, the last thing you want is to have to remember yet another idiom (the current idioms don't go away). Abigail ]