SPUG: Searching for multiple strings

greg heil gheil at uswest.net
Thu Jul 6 13:42:43 CDT 2000


Andy

The problem you outlined is known as the "exact
set matching" problem. Actually it is slightly simpler
- as you do not need to know the location of all
matches, just the existence. However, the ESM is well
enough solved, Eg p52 of:
"Algorithms on Strings, Trees, and Sequences:
 Computer Science and Computational Biology" '97 by
 Dan Gusfield, http://wwwcsif.cs.ucdavis.edu/~gusfield/

It is solved there to O(n+m+k) with n the size of the
text, m the sum of the sizes of the patterns and k the
number of hits of P in T.

It sounds like your Perl solutions are having a hard
time achieving a reduction from O(nm+cm) to O(nm) - ie
effectively factoring out the compilation time for the
search automata.

If your problem is important enough you might want
to implement a better algorithm directly. It is not
clear to me how the designers of Perl could be given
syntactic clues signaling you want to solve an ESM;-)
Thus you may need to resort to a language that exposes
more of the hardwares capabilities.

-- 
greg heil
mailto:gheil at acm.org
http://www.scn.org/tl/anvil

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
     POST TO: spug-list at pm.org       PROBLEMS: owner-spug-list at pm.org
 Seattle Perl Users Group (SPUG) Home Page: http://www.halcyon.com/spug/
     Subscriptions; Email to majordomo at pm.org:  ACTION  LIST  EMAIL
  Replace ACTION by subscribe or unsubscribe, EMAIL by your Email-address
 For full traffic, use spug-list for LIST ; otherwise use spug-list-digest





More information about the spug-list mailing list