SPUG: Re: search algorithm

Richard Anderson richard at richard-anderson.org
Mon Dec 17 17:48:24 CST 2001

A more detailed statement of the problem might be of help here.  Do you have
to find every occurence of a string or just a single occurence?  What is the
purpose of this application? Etc.

My initial inclination is that a restructuring of the data would be the best
way to speed things up.  Could the 7000 searches be run each time the files
are updated and the results stored?  How about using a B-tree search?  How
about looking through Knuth, The Art of Computer Programming, vol. 3 or
Sedgewick, Algorithms in C++: Fundamentals, Data Structure, Sorting,

Hope this helps.

Richard Anderson
Software Professional
richard at richard-anderson.org
----- Original Message -----
From: "Martin Korb" <mkorb at versuslaw.com>
To: <spug-list at pm.org>
Sent: Monday, December 17, 2001 7:57 AM
Subject: SPUG: search algorithm

Here is the problem:

I have 7000 database entries which may or may be found in the content of
over 50000 files.
At the moment I am using a linear search,
the search patterns are the keys of a hash, find(\&wanted, $startdir) is
called on each key of the hash, if the key is found, stop recursing the dir,
delete this key and go the next one, start all over again. This could
potentially be over 3 x 10 exp 7 searches.
Obviously, this will take much to long with a linear search algorithm.

Which is the best algorithm to use for such a search and where can I find
out more about it? Any help is much appreciated.

Thanks  Martin

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
     POST TO: spug-list at pm.org       PROBLEMS: owner-spug-list at pm.org
      Subscriptions; Email to majordomo at pm.org:  ACTION  LIST  EMAIL
  Replace ACTION by subscribe or unsubscribe, EMAIL by your Email-address
 For daily traffic, use spug-list for LIST ;  for weekly, spug-list-digest
     Seattle Perl Users Group (SPUG) Home Page: http://zipcon.net/spug/

More information about the spug-list mailing list