[Melbourne-pm] basic regex - char array vs "pipe brackets"

Toby Corkindale toby.corkindale at strategicdata.com.au
Thu Mar 5 15:30:33 PST 2009


Myf White wrote:
> On Thu, Mar 5, 2009 at 4:25 PM, Toby Corkindale 
> <toby.corkindale at strategicdata.com.au 
> <mailto:toby.corkindale at strategicdata.com.au>> wrote:
> 
>     wigs at stirfried.org <mailto:wigs at stirfried.org> wrote:
> 
>         On Thu, Mar 05, 2009 at 03:10:47PM +1100, Christopher Short wrote:
> 
>             I saw this regex in someone else's code (to check validity
>             of an id field)
>             /^(\w|\@|\.|\-)+$/
> 
>             and decided that it wouldn't work properly, that what they'd
>             meant was
>             the character array
>             /^[\w\@\.\-]+$/
> 
> 
>             Luckily I chucked it into Regex Coach and found both of them
>             worked
>             just as well.
>             Thing is, when I look at
>             /^(\w|\@|\.|\-)+$/
> 
> 
>         The '+' here means one or more repetion of the previous regex, not
>         whatever it happens to match.
> 
>             Turns out the long-term perl developer who wrote that code
>             always uses
>             "pipe brackets" instead of character arrays.
>             Do they really function identically?
> 
> 
>         The biggest difference between character arrays and the parens
>         grouping
>         is that the character array allows for a compact specification
>         of ranges,
>         eg:
> 
>          [a-z] is the same as
>         (?:a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z)
> 
>         As such, the hyphen has special meaning within the character set,
>         this is often a source of surprise to some perl developers when the
>         regex doesn't do what they expect.
> 
> 
>     Although it's useful to remember that if the first character in a
>     character array is a hyphen, then it is matched literally, ie:
> 
>     'foo-bar' =~ /[-a-z]/ ; # is true
> 
> 
>     tjc.
> 
> 
> Actually, that regex is only matching the first character, ie the 'f', 
> so would be true without the initial hypen. 

Argh, yes! So much for my quick example.
I think I meant to write /^[-a-z]+$/ but your example with {7} is clearer.

Thanks for spotting it.

> I think you mean that every 
> character in the string is in the character class, so
> 'foo-bar' =~ /[-a-z]{7}/ ; # is true
> 
> You can also put the hyphen last in the character classe, ie 
> 'foo-bar' =~ /[a-z-]{7}/ ; # is also true
> 
> However, I thinks it's better to always put it first, because otherwise 
> it's very easy to miss.


More information about the Melbourne-pm mailing list