SPUG: Directory Fun
John W. Krahn
krahnj at telus.net
Thu Mar 17 23:02:17 PST 2005
Perl wrote:
> I'm working backroom stuff so I don't often have to worry that much
> about performance. We use a lot of network-mounted directories,
> however, fairly large ones at that, so I decided to write a script to
> compare some common (?) ways to go through a directory looking for
> filenames that matched a specific pattern.
>
> I was kind of shocked at what I found (see end of script below). Now
> keep in mind that I'm running on Windows 2000 and all of the directories
> are network mounts (as assigned drive letters). Still, I would have
> thought that File::Find would have worked a touch better than it did.
> And the way the file glob works is odd, to say the least.
File::Find is designed to traverse a directory tree so adding all that
overhead to read a single directory may explain why it is slower. You may
also want to try it with the options 'bydepth' or 'no_chdir' or the variable
$File::Find::prune.
> So I enter this into the group mind to see what response I get. I don't
> mind finding out that some of my tests are badly written. Learning is
> good. I'm curious if other people find similar results or different.
>
> I'm on my home Linux box right now, trying to duplicate the test. But
> it's all local directories and they aren't of the same size so it's
> difficult to seen any results at all. Plus the glob isn't matching
> anything. But I'm really more interested in Windows since that's where
> I'm earning my money right now.
>
> ###################################################################
> ###################################################################
>
> use strict;
> use warnings;
>
> use File::Find;
> use IO::Dir;
>
> # You'll need to make up your own list of directories here.
> # Mine contain a total of 20,000+ files and are all network mounts.
> # You can avoid File::Spec by entering them all by hand.
>
> use File::Spec;
>
> my @dirs = map {
> File::Spec->catfile('S:\DA\Feds', $_, 'done')
> } qw(Fed1 Fed2 Fed3 Fed4 Fed5 Fed6 Fed7 Fed8 Fed9 Fed10 Fed11);
While opendir() and IO::Dir->new() require single directory names,
File::Find::find() and glob() can operate on lists of directories, for example:
<S:/DA/Feds/Fed1/done/$glb S:/DA/Feds/Fed2/done/$glb S:/DA/Feds/Fed3/done/$glb
etc.>
Or even use globs on directory names:
<S:/DA/Feds/Fed[1-9]/done/$glb S:/DA/Feds/Fed1[01]/done/$glb>
> # These are the patterns used to sieve through the files.
> # Your patterns will be dependent on your source directories.
> # The first pattern is a regular expression to be applied
> # to each filename. The second is for the 'glob' code.
> # It's worth playing with these to see if different patterns
> # make a difference in the times for the various directories.
>
> my $ptn = qr(^02)i;
What are the lower case and upper case versions of '0' and '2'?
> my $glb = '02*';
>
> #y $ptn = qr(\.doc$)i;
> #y $glb = '*.doc';
Those two aren't exactly equivalent.
my $ptn = qr(\.doc\z)i;
my $glb = '*.[Dd][Oo][Cc]';
> #y $ptn = qr(\.pdf$)i;
> #y $glb = '*.pdf';
See above.
> # This set of patterns counts all files:
> #y $ptn = qr(.)i;
> #y $glb = '*';
Again, not exactly equivalent and the /i option is superfluous.
my $ptn = qr(.)s;
my $glb = '*';
> ###################################################################
Why not just use the Benchmark module?
> sub duration
> {
> my $durn = shift;
>
> return '{unknown}'
> unless defined $durn;
>
> my $secs = $durn % 60; $durn = int($durn / 60);
> my $mins = $durn % 60; $durn = int($durn / 60);
> my $hour = $durn % 24; $durn = int($durn / 24);
>
> $durn ? sprintf('%d %02d:%02d:%02d', $durn, $hour, $mins, $secs) :
> $hour ? sprintf( '%d:%02d:%02d', $hour, $mins, $secs)
> : sprintf( '%d:%02d', $mins, $secs)
> }
>
> ###################################################################
>
> sub test ($&@)
Ick, a prototype! Prototypes aren't very useful:
http://library.n0i.net/programming/perl/articles/fm_prototypes/
And why put @ in there if you are only expecting two arguments?
> {
> my $name = shift;
>
> print $name;
>
> my $func = shift;
> my $cnt = 0;
> my $secs = undef;
>
> eval {
> my $strt = time;
>
> $cnt += &$func($_)
> for @dirs;
>
> $secs = time - $strt;
> };
Why eval() that code?
> printf " found %7d file%s in %s\n",
> $cnt, ($cnt == 1 ? '' : 's'), duration($secs);
> }
[snip code]
John
--
use Perl;
program
fulfillment
More information about the spug-list
mailing list