Bath.pm April 2 Meeting Report

Newton, Philip Philip.Newton at datenrevision.de
Fri Apr 5 01:19:53 CST 2002


Dave Hodgkinson wrote:
> "Magnus Huckvale" <magnus at huckvale.net> writes:
> 
> >   - something to do with locate databases that took more 
> >     than 24 hours to build
> 
> I found out why. A directory with a million files in it takes a LONG
> time to traverse.

Especially with linked-list file structures such as ext[23].

German computer magazine c't had an article about journaling file systems
recently and they said that tree structures such as B+ or B* trees (as used,
for example, in ReiserFS, XFS, and IBM's JFS, but not in ext3) are more
scalable to more files per directory, but that the difference is only
noticeable when you have thousands of files in one directory.

I suppose "a million files" would fall into that range ;) Would it make
sense to put that directory on a ReiserFS partition? (Though I suppose that
if you want to go through all files anyway, it shouldn't matter that much
what data structure is used; that only comes into play when you want to find
a *particular* file, or when you want to delete a file[1].)

Cheers,
Philip

[1] Apparently it can be pretty important in which order you delete files
from a really huge directory -- there can be worst-case and best-case orders
depending on the file system used.
-- 
Philip Newton <Philip.Newton at datenrevision.de>
All opinions are my own, not my employer's.
If you're not part of the solution, you're part of the precipitate.



More information about the Bath-pm mailing list