[Mpls-pm] Interesting RegEx Problem

Ian Malpass ian at indecorous.com
Mon Sep 19 11:11:24 PDT 2005


On Mon, 19 Sep 2005, Robert Fischer wrote:

> I need to rephrase the problem, though, because I realize I left a major
> aspect out of the phrasing:  Given a *collection of* strings and a
> collection of regular expression strings, where it is known that each
> string is matched by precisely one regular expression, how do you most
> efficiently develop the mapping?
>
> Given that arrangement, I'm currently looking at lexically sorting the
> strings and getting an MRU list for the pattern matches.  That
> implementation assumes that strings which are lexically close are liable
> to match the same (or similar) regular expressions.

Are they, though? "Alphabet" and "ZZ 9 Plural Z Alpha" both match /Alpha/. 
And /^.*Alpha.*$/ for that matter, if you need to match the entire string. 
A bit of a pathalogical case, I know, but it illustrates the point.

I'm not saying your approach won't bring some improvement, but the extra 
list maintenance and sorting will add overhead too. I'd profile it against 
a more straightforward approach (e.g. nested loops) and see what benefits 
it brings. Probably depends on the set sizes. Certainly depends on the set 
contents.

More complexity brings more opportunity for things to go wrong, and more 
difficulty maintaining things later :(

Ian

-
---------------------------------------------------------------------------

The soul would have no rainbows if the eyes held no tears.

Ian Malpass
<ian at indecorous.com>


More information about the Mpls-pm mailing list