My code currently calls the UNIX "du" command to get the size of a directory structure:<div><div> $size = `/usr/bin/du -sk $DATA_DIR | cut -f1`;</div></div><div><br></div><div>Knowing that shells are CPU time expensive and generally not portable across platforms I am looking into replacing it with a pure perl implementation:</div>
<div><div> find( sub { -f and ( $size += -s _ ) }, $DATA_DIR );</div><div><br></div><div>Wanting to be able to brag about the speed increase, I timed them with the Benchmark routines, and got a shock when I tested against my /tmp directory:</div>
<div><div><font class="Apple-style-span" face="'courier new', monospace"> Rate Internal Shell_du</font></div><div><font class="Apple-style-span" face="'courier new', monospace">Internal 11.6/s -- -99%</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace">Shell_du 1538/s 13123% --</font></div></div><div><br></div><div>WOW! The shell to du was 13 TIMES faster than the internal find code. (FYI, the /tmp/ directory has 349MB across 6400 files.)</div>
<meta http-equiv="content-type" content="text/html; charset=utf-8"><div><br></div><div>As a test, I created a very small directory structure (12 files, 2 sub-directories, 120KB) and the results for 10,000 timings are opposite:</div>
<div><div><font class="Apple-style-span" face="'courier new', monospace"> Rate Shell_du Internal</font></div><div><font class="Apple-style-span" face="'courier new', monospace">Shell_du 1664/s -- -68%</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace">Internal 5208/s 213% --</font></div></div><div><br></div><div>This time the internal code was faster...</div><div><br></div><div><meta http-equiv="content-type" content="text/html; charset=utf-8"><div>
My test system is a CentOS 5.5 64-bit (2GB RAM, mostly free RAM used for caching), with Perl 5.8.8, and the /tmp filesystem is an EXT3.</div><div><br></div><div>This bit of code isn't time critical and the actual data that will be processed is closer to the 120K test case, so I may continue and remove the shell/du line, but I'd like to know how this got so slow!</div>
</div><div><br></div><div>Dan</div><div><br></div><div>Just in case I made a blunder, here's the test code:</div><div><div><font class="Apple-style-span" face="'courier new', monospace">#!/usr/bin/perl -w</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace">use strict;</font></div><div><font class="Apple-style-span" face="'courier new', monospace">use Benchmark qw(:all);</font></div><div><font class="Apple-style-span" face="'courier new', monospace">use File::Find;</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace"><br></font></div><div><font class="Apple-style-span" face="'courier new', monospace">my $foo = 0;</font></div><div><font class="Apple-style-span" face="'courier new', monospace">my $count = shift || 2000;</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace">my $DATA_DIR = shift || "/tmp";</font></div><div><font class="Apple-style-span" face="'courier new', monospace"><br></font></div>
<div><font class="Apple-style-span" face="'courier new', monospace">sub shell_du {</font></div><div><font class="Apple-style-span" face="'courier new', monospace"> my $size = 0;</font></div><div><font class="Apple-style-span" face="'courier new', monospace"> $size = `/usr/bin/du -sk $DATA_DIR | cut -f1`;</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace"> chomp $size;</font></div><div><font class="Apple-style-span" face="'courier new', monospace"> return $size;</font></div><div>
<font class="Apple-style-span" face="'courier new', monospace">}</font></div><div><font class="Apple-style-span" face="'courier new', monospace"><br></font></div><div><font class="Apple-style-span" face="'courier new', monospace">sub internal_du {</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace"> my $size = 0;</font></div><div><font class="Apple-style-span" face="'courier new', monospace"> find( sub { -f and ( $size += -s _ ) }, $DATA_DIR );</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace"> return $size;</font></div><div><font class="Apple-style-span" face="'courier new', monospace">}</font></div><div><font class="Apple-style-span" face="'courier new', monospace"><br>
</font></div><div><font class="Apple-style-span" face="'courier new', monospace">cmpthese ($count, {</font></div><div><font class="Apple-style-span" face="'courier new', monospace"> 'Shell_du' => sub { $foo = shell_du(); },</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace"> 'Internal' => sub { $foo = internal_du(); },</font></div><div><font class="Apple-style-span" face="'courier new', monospace">});</font></div>
</div><div><br></div><meta http-equiv="content-type" content="text/html; charset=utf-8"><div>-- </div>***************** ************* *********** ******* ***** *** **<br>"Quis custodiet ipsos custodes?"<br> (Who can watch the watchmen?)<br>
-- from the Satires of Juvenal<br>"I do not fear computers, I fear the lack of them."<br> -- Isaac Asimov (Author)<br>** *** ***** ******* *********** ************* *****************<br>
</div>