SPUG: Directory Fun

Thu Mar 17 23:02:17 PST 2005

Perl wrote:
> I'm working backroom stuff so I don't often have to worry that much 
> about performance.  We use a lot of network-mounted directories, 
> however, fairly large ones at that, so I decided to write a script to 
> compare some common (?) ways to go through a directory looking for 
> filenames that matched a specific pattern.
> 
> I was kind of shocked at what I found (see end of script below).  Now 
> keep in mind that I'm running on Windows 2000 and all of the directories 
> are network mounts (as assigned drive letters).  Still, I would have 
> thought that File::Find would have worked a touch better than it did. 
> And the way the file glob works is odd, to say the least.

File::Find is designed to traverse a directory tree so adding all that
overhead to read a single directory may explain why it is slower.  You may
also want to try it with the options 'bydepth' or 'no_chdir' or the variable
$File::Find::prune.

> So I enter this into the group mind to see what response I get.  I don't 
> mind finding out that some of my tests are badly written.  Learning is 
> good.  I'm curious if other people find similar results or different.
> 
> I'm on my home Linux box right now, trying to duplicate the test.  But 
> it's all local directories and they aren't of the same size so it's 
> difficult to seen any results at all.  Plus the glob isn't matching 
> anything.  But I'm really more interested in Windows since that's where 
> I'm earning my money right now.
> 
> ###################################################################
> ###################################################################
> 
> use     strict;
> use     warnings;
> 
> use     File::Find;
> use     IO::Dir;
> 
> # You'll need to make up your own list of directories here.
> #   Mine contain a total of 20,000+ files and are all network mounts.
> #   You can avoid File::Spec by entering them all by hand.
> 
> use     File::Spec;
> 
> my  @dirs = map {
>     File::Spec->catfile('S:\DA\Feds', $_, 'done')
> } qw(Fed1 Fed2 Fed3 Fed4 Fed5 Fed6 Fed7 Fed8 Fed9 Fed10 Fed11);

While opendir() and IO::Dir->new() require single directory names, 
File::Find::find() and glob() can operate on lists of directories, for example:

<S:/DA/Feds/Fed1/done/$glb S:/DA/Feds/Fed2/done/$glb S:/DA/Feds/Fed3/done/$glb 
etc.>

Or even use globs on directory names:

<S:/DA/Feds/Fed[1-9]/done/$glb S:/DA/Feds/Fed1[01]/done/$glb>

> # These are the patterns used to sieve through the files.
> #   Your patterns will be dependent on your source directories.
> #   The first pattern is a regular expression to be applied
> #   to each filename.  The second is for the 'glob' code.
> #   It's worth playing with these to see if different patterns
> #   make a difference in the times for the various directories.
> 
> my  $ptn = qr(^02)i;

What are the lower case and upper case versions of '0' and '2'?

> my  $glb = '02*';
> 
> #y  $ptn = qr(\.doc$)i;
> #y  $glb = '*.doc';

Those two aren't exactly equivalent.

my  $ptn = qr(\.doc\z)i;
my  $glb = '*.[Dd][Oo][Cc]';

> #y  $ptn = qr(\.pdf$)i;
> #y  $glb = '*.pdf';

See above.

> # This set of patterns counts all files:
> #y  $ptn = qr(.)i;
> #y  $glb = '*';

Again, not exactly equivalent and the /i option is superfluous.

my  $ptn = qr(.)s;
my  $glb = '*';

> ###################################################################

Why not just use the Benchmark module?

> sub duration
> {
>     my  $durn = shift;
> 
>     return '{unknown}'
>         unless defined $durn;
> 
>     my  $secs = $durn % 60;  $durn = int($durn / 60);
>     my  $mins = $durn % 60;  $durn = int($durn / 60);
>     my  $hour = $durn % 24;  $durn = int($durn / 24);
> 
>     $durn ? sprintf('%d %02d:%02d:%02d', $durn, $hour, $mins, $secs) :
>     $hour ? sprintf(     '%d:%02d:%02d',        $hour, $mins, $secs)
>           : sprintf(          '%d:%02d',               $mins, $secs)
> }
> 
> ###################################################################
> 
> sub test ($&@)

Ick, a prototype!  Prototypes aren't very useful:

http://library.n0i.net/programming/perl/articles/fm_prototypes/

And why put @ in there if you are only expecting two arguments?

> {
>     my  $name = shift;
> 
>     print $name;
> 
>     my  $func = shift;
>     my  $cnt  = 0;
>     my  $secs = undef;
> 
>     eval {
>         my  $strt = time;
> 
>         $cnt += &$func($_)
>             for @dirs;
> 
>         $secs = time - $strt;
>     };

Why eval() that code?

>     printf " found %7d file%s in %s\n",
>            $cnt, ($cnt == 1 ? '' : 's'), duration($secs);
> }

[snip code]

John
-- 
use Perl;
program
fulfillment