APM: processing lots of files?
Sam Foster
austin.pm at sam-i-am.com
Wed Apr 28 09:21:12 CDT 2004
Wayne Walker wrote:
> First, if you have the local disk space, then you should mirror the
> data, then parse it.walking directories on a net file system is slow.
I have the disk space, but not the time to mirror it. Though the rsynch
tip is a good one and would mitigate this.
So far I've used activestate's perlapp to make a executable of each
script that I can drop on the server and run locally. That's really
helped performance enormously. I'll be stumping up the $100 for their
PDK I think.
I also looked into Parallel::ForkManager and got some test scripts
running, but I'll need to spend more time with this to get it to wrap my
existing scripts, or adapt them to use it.
> What is the maximum # of files/directories in any one directory? This
> has a large impact on performance, especially on networked disks.
>
> What is the size of the whole directory tree (in MBytes).
There's no more than 10-20 files per directory. The whole thing is about
3.5 GB, 16,000 individual directories (I've been cleaning. It used to be
29,000)
The xml validation (against a schema) I handed off to a collegue who
whipped up a .NET console app that is speedy and adequate for the task.
thanks for all your help
Sam
More information about the Austin
mailing list