[sf-perl] help with matching?

Fred Moyer fred at redhotpenguin.com
Mon Jul 12 11:55:38 PDT 2010


On Mon, Jul 12, 2010 at 11:30 AM, David Alban <extasia at extasia.org> wrote:
> 2010-07-12 17:46:21 +0000 srwd00reg001 junk.perl[356] 6 =>
> 'srwd15abx001.srwd15.com'
> 2010-07-12 17:46:21 +0000 srwd00reg001 junk.perl[356] 7 => 'srwd15abx001'
> 2010-07-12 17:46:21 +0000 srwd00reg001 junk.perl[356] 8 => '.srwd15.com'
> 2010-07-12 17:46:21 +0000 srwd00reg001 junk.perl[356] 9 => '.srwd15.com'
> 2010-07-12 17:46:21 +0000 srwd00reg001 junk.perl[356] 10 => 'srwd15hst001'
> 2010-07-12 17:46:21 +0000 srwd00reg001 junk.perl[356] 11 => '.srwd15.com'
> 2010-07-12 17:46:21 +0000 srwd00reg001 junk.perl[356] 12 => ''
> 2010-07-12 17:46:21 +0000 srwd00reg001 junk.perl[356] 13 => ''
> 2010-07-12 17:46:21 +0000 srwd00reg001 junk.perl[356] 14 => ''
> $VAR1 = qr/(?msx-i: \A \s* (?x-ism: ( ( \d{1,3} ) [.] ( \d{1,3} ) [.]
> ( \d{1,3} ) [.] ( \d{1,3} ) ) ) \s+ (?x-ism: ( ( \w+ ) ( [.] \w+ [.]
> \w+ )? ) ) \s+ (?x-ism: ( ( \w+ ) ( [.] \w+ [.] \w+ )? ) ) \s* \z )/;
>
> here is the line to parse (excluding the single quotes):
>
> '10.80.15.14     srwd15abx001.srwd15.com   srwd15hst001.srwd15.com  '
>
> here is my attempt to make the regex more human readable [edited above
> line by hand--best viewed with non-proportional font:]
>
> qr/
>  (?msx-i:
>    \A
>      \s*
>        (?x-ism: (
>                       ( \d{1,3} )
>                   [.] ( \d{1,3} )
>                   [.] ( \d{1,3} )
>                   [.] ( \d{1,3} )
>                 )
>        )
>        \s+
>        (?x-ism: (
>                   ( \w+ )
>                   (
>                     [.] \w+ [.] \w+
>                   )?
>                 )
>        )
>        \s+
>        (?x-ism: (
>                   ( \w+ )
>                   (
>                     [.] \w+ [.] \w+
>                   )?
>                 )
>        )
>      \s*
>    \z
>  )
> /;
>
> what am i missing?

IMHO this regex is too complicated.  Why are you attempting to pull
out the dotted quads of the ip address individually instead of
grabbing the ip address with one match?  Same for the host name - it
is better (IMHO) to make a more readable piece of code that could be
more computationally expensive.

You might be able to simplify this using some other tools such as
split().  10 lines of readable code with comments trumps a single
regex that one has to put effort into understanding.


More information about the SanFrancisco-pm mailing list