[Omaha.pm] A regex "best fit" finder?

Christopher Cashell topher-pm at zyp.org
Thu Sep 29 14:15:05 PDT 2011


2011/9/29 Dan Linder <dan at linder.org>:
> Example:
> OMAWWW001
> OMAWWW002
> OMADNS001
> ORDWWW001
> ORDWWW002
> ORDWWW003
> ORDDNS001
> ORDDNS002
> Any thoughts?

I've dealt with a similar thing at work.  It can be incredibly tricky,
depending on the names in question, how variable they are, and whether
you just want to match them roughly, or if you want to match them to
validate them.

For example, from the data listed, they appear to be all of the form:
3 letter site/city code, followed by 3 letter function/machine code,
followed by a 3 digit number.  If you just wanted to catch anything
that matches that format, you could possibly do something like:

/\w{3}\w{3}\d{3}/

Depending on the number of site/city codes and the number of
function/machine codes, you could do something like (Note: start of
line/field anchor added to improve performance with alternations;
depending on how much data you're processing, it may not matter or be
applicable):

/^(OMA|ORD)(WWW|DNS)\d{3}/

This would allow you to validate that not only does the 3 letter, 3
letter, 3 digit form matches, but that it validates to expected site
and function codes.  This also has the advantage that it works with
codes that aren't exactly 3 letters (i.e. if you want to use SMTP for
a mail server).

If you've got a decent number of entries, you might want to reformat
it with /x for increased readability:

/^ (OMA|ORD|DEN|SEA|LAX)
   (WWW|DNS|SMTP|IRC|DB)
   \d{3} /x

Without knowing more about the current names, as well as potential
future names, that's probably the best I can think of.

> Thanks,
> DanL

-- 
Christopher


More information about the Omaha-pm mailing list