APM: processing lots of files?

Sam Foster austin.pm at sam-i-am.com
Wed Apr 28 09:21:12 CDT 2004


Wayne Walker wrote:
> First, if you have the local disk space, then you should mirror the
> data, then parse it.walking directories on a net file system is slow.

I have the disk space, but not the time to mirror it. Though the rsynch 
tip is a good one and would mitigate this.

So far I've used activestate's perlapp to make a executable of each 
script that I can drop on the server and run locally. That's really 
helped performance enormously. I'll be stumping up the $100 for their 
PDK I think.

I also looked into Parallel::ForkManager and got some test scripts 
running, but I'll need to spend more time with this to get it to wrap my 
existing scripts, or adapt them to use it.


> What is the maximum # of files/directories in any one directory?  This
> has a large impact on performance, especially on networked disks.
> 
> What is the size of the whole directory tree (in MBytes).

There's no more than 10-20 files per directory. The whole thing is about 
3.5 GB, 16,000 individual directories (I've been cleaning. It used to be 
29,000)

The xml validation (against a schema) I handed off to a collegue who 
whipped up a .NET console app that is speedy and adequate for the task.

thanks for all your help
Sam



More information about the Austin mailing list