[Melbourne-pm] basic regex - char array vs "pipe brackets"

Wed Mar 4 21:02:34 PST 2009

On Thu, Mar 05, 2009 at 03:10:47PM +1100, Christopher Short wrote:
> I saw this regex in someone else's code (to check validity of an id field)
> /^(\w|\@|\.|\-)+$/
> 
> and decided that it wouldn't work properly, that what they'd meant was
> the character array
> /^[\w\@\.\-]+$/

> 
> Luckily I chucked it into Regex Coach and found both of them worked
> just as well.
> Thing is, when I look at
> /^(\w|\@|\.|\-)+$/

The '+' here means one or more repetion of the previous regex, not
whatever it happens to match.

> Turns out the long-term perl developer who wrote that code always uses
> "pipe brackets" instead of character arrays.
> Do they really function identically?

The biggest difference between character arrays and the parens grouping
is that the character array allows for a compact specification of ranges,
eg:

  [a-z] is the same as (?:a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z)

As such, the hyphen has special meaning within the character set,
this is often a source of surprise to some perl developers when the
regex doesn't do what they expect.

Regards,

-- 
Aaron