[Omaha.pm] A regex "best fit" finder?

Dan Linder dan at linder.org
Thu Sep 29 14:23:01 PDT 2011


Yeah, my simple list was very simple compared to the actual list of files.
 I envision the final solution would be a number of match strings strung
together with "|"...

Thanks, I'll keep looking.

Dan

On Thu, Sep 29, 2011 at 16:15, Christopher Cashell <topher-pm at zyp.org>wrote:

> 2011/9/29 Dan Linder <dan at linder.org>:
> > Example:
> > OMAWWW001
> > OMAWWW002
> > OMADNS001
> > ORDWWW001
> > ORDWWW002
> > ORDWWW003
> > ORDDNS001
> > ORDDNS002
> > Any thoughts?
>
> I've dealt with a similar thing at work.  It can be incredibly tricky,
> depending on the names in question, how variable they are, and whether
> you just want to match them roughly, or if you want to match them to
> validate them.
>
> For example, from the data listed, they appear to be all of the form:
> 3 letter site/city code, followed by 3 letter function/machine code,
> followed by a 3 digit number.  If you just wanted to catch anything
> that matches that format, you could possibly do something like:
>
> /\w{3}\w{3}\d{3}/
>
> Depending on the number of site/city codes and the number of
> function/machine codes, you could do something like (Note: start of
> line/field anchor added to improve performance with alternations;
> depending on how much data you're processing, it may not matter or be
> applicable):
>
> /^(OMA|ORD)(WWW|DNS)\d{3}/
>
> This would allow you to validate that not only does the 3 letter, 3
> letter, 3 digit form matches, but that it validates to expected site
> and function codes.  This also has the advantage that it works with
> codes that aren't exactly 3 letters (i.e. if you want to use SMTP for
> a mail server).
>
> If you've got a decent number of entries, you might want to reformat
> it with /x for increased readability:
>
> /^ (OMA|ORD|DEN|SEA|LAX)
>   (WWW|DNS|SMTP|IRC|DB)
>   \d{3} /x
>
> Without knowing more about the current names, as well as potential
> future names, that's probably the best I can think of.
>
> > Thanks,
> > DanL
>
> --
> Christopher
> _______________________________________________
> Omaha-pm mailing list
> Omaha-pm at pm.org
> http://mail.pm.org/mailman/listinfo/omaha-pm
>



-- 
***************** ************* *********** ******* ***** *** **
"Quis custodiet ipsos custodes?"
    (Who can watch the watchmen?)
    -- from the Satires of Juvenal
"I do not fear computers, I fear the lack of them."
    -- Isaac Asimov (Author)
** *** ***** ******* *********** ************* *****************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/omaha-pm/attachments/20110929/57a513c2/attachment.html>


More information about the Omaha-pm mailing list