[LA.pm] perl CGI querying of directory filenames most efficient method?

Thu Sep 22 11:21:05 PDT 2005

I hope I've written an email that can be understood.
My advise is to read it all the way through before
replying, as it is a complex "overall" efficiency
question, involving just not the perl CGI code,
but also the web server needing the same directory.

--

What would be the most CPU/IO efficient method to
test whether the following filenames exist to
get images to display on a web page, where any
matching filenames should be displayed?

Only one file from the following possibilities
the client would be uploading:

skunumber.jpg
skunumber.gif
skunumber.medium.jpg
skunumber.medium.gif
skunumber.large.jpg
skunumber.large.gif

Currently, it uses if-elsif-elsif-elsif... using the -e test
against the full pathname.  

Plus any matching names from this list:

skunumber.A.jpg   (second image to display with the one above)
skunumber.B.jpg   (third, etc...)
skunumber.C.jpg
skunumber.D.jpg

This uses 4 if statements using the -e test against the full pathname.  

So, either 0, 1, or 2 to 5 images might be displayed.

Would it be faster to get the entire directory listing of
12,000 images (and growing) with this type of statement:

cd pathname;
foreach $filename ( <skunumber*.*> ) {
  if-elsif-else 
}

Maybe a readdir would be even faster?  Remember that
File Caching may make these questions immaterial, as
the web server needs to access the same images/ folder
as well, and that needs to be taking into consideration
for the overall efficiency, which I did not address
in the questions below.

Questions: 

Does the Unix File Cache also cache directory
information?  I imagine it does, so this CGI script would
not have to access the hard drive to get the list of files,
but just get it from RAM, for each and every -e test,
or for the foreach, or readdir methods. 

Would the -e test have to go to the hard drive each time?
Or would the entire directory 'file' be in the File Cache
as well in that case?  Which would mean both methods are
efficient, and either would do fine.

I might do a comparison testing in a loop, but then that
would not simulate a CGI script, unless the script was
invoked by an outer loop coded in a shell script to
create new PIDs each time (avoid using the directory
contents that would be "buffered" in File IO cache,
or even in the perl buffers).

What comparison testing method would you use?

Right now it is "cheaper" programmer time wise to do -e 
testing than add a database field to store the values, 
which one might imagine would be faster , but even then
the directory contents needs to be accessed in order
to web serve the image, and having the directory
contents in File Cache due to the web server using
it means the database field might not be any faster
than either the if-elsif-else or foreach methods.

Pete