[Chicago-talk] array to hash and counting files
Steven Lembark
lembark at wrkhors.com
Tue Dec 30 17:16:41 CST 2003
-- Jeremy Hubble <jhubble at core.com>
> I have a series of directories that each have a number of
> subdirectories with the same directory structure. I need to count the
> number of files and directories in each subdirectory, and get a list of
> all unique files.
>
> Here is the code fragment I have (not tested yet):
>
> Is there a more effecient way to:
> 1) Extract the unique list of files?
> 2) Use perl to replace the find commant?
Any time you feel the word "unique" percolating through your
brain think "hash". If the hash key is the first-level subdir
then ++$hash{$subdir} will count the items by subdir, to
break the count out by file type use ++$hash->{$subdir}{$type}.
use File::Find qw( &finddepth );
use File::Basename qw( &basename );
# name is immaterial, could also be an anonymous
# sub defined on the finddepth call line.
#
# File::Find is kind enough to define $File::Find::dir
# as the directory and chdir to it. the current file's
# basename is in $_ (i.e. -e, -d, etc, work as expected).
#
# the handler could also use stat to get the file type
# and store that but outputting the results in human-
# usable form gets a bit hairly. for now assume that
# anything is either a file or directory.
#
# making countz a referent saves us from $countz{$subdir}->{type}
# notation (I find it easier to read the -> toward the front).
my $subdir = '';
my %unique = ();
my $countz = {};
sub handler
{
if( -d )
{
++$countz{$subdir}{dirz};
}
else
{
++$countz{$subdir}{filz};
}
++$unique{$_};
}
# iterate over all the directory items in the base directory
# processing each item through the handler.
for( grep { -d } glob "$basedir/*" )
{
$subdir = basename $_;
finddepth \&handler, $_;
}
# at this point keys %$counts are the subdirz of $basedir,
# and the values are counts of files and directories by
# subdir. %unique is keyed by basename of whatever was found
# in the subdirs.
The one variation you might want is to find the relative paths
within the subdirectories (i.e., keys of %unique are full paths
relative to $basedir/$subdir). In this case use the fact that
$File::Find::dir is a relative path when the input directory is
a relative path iteslf:
sub handler
{
if( -d )
{
++$countz{$subdir}{dirz};
}
else
{
++$countz{$subdir}{filz};
}
++$unique{$File::Find::name};
}
for( grep { -d } glob "$basedir/*" )
{
chdir $_;
subdir = basename $_;
finddepth \&handler, '.';
}
The combination of chdir and finddepth w/ '.' will leave all
the $File::Find::name entries as relative paths:
DB<1> finddepth sub { print $File::Find::name, "\n" }, '.'
./output
./CVS/Root
./CVS/Repository
./CVS/Entries
./CVS/Entries.Log
./CVS
./CVSROOT/checkoutlist
./CVSROOT/commitinfo
./CVSROOT/config
./CVSROOT/cvswrappers
./CVSROOT/editinfo
<snip>
At this point the keys of %unique will be paths relative
to the various subdir's, which will give you a unique list
of the files within the general file tree.
--
Steven Lembark 2930 W. Palmer
Workhorse Computing Chicago, IL 60647
+1 888 359 3508
More information about the Chicago-talk
mailing list