[Melbourne-pm] basic regex - char array vs "pipe brackets"

Wed Mar 4 21:25:40 PST 2009

wigs at stirfried.org wrote:
> On Thu, Mar 05, 2009 at 03:10:47PM +1100, Christopher Short wrote:
>> I saw this regex in someone else's code (to check validity of an id field)
>> /^(\w|\@|\.|\-)+$/
>>
>> and decided that it wouldn't work properly, that what they'd meant was
>> the character array
>> /^[\w\@\.\-]+$/
> 
>> Luckily I chucked it into Regex Coach and found both of them worked
>> just as well.
>> Thing is, when I look at
>> /^(\w|\@|\.|\-)+$/
> 
> The '+' here means one or more repetion of the previous regex, not
> whatever it happens to match.
> 
>> Turns out the long-term perl developer who wrote that code always uses
>> "pipe brackets" instead of character arrays.
>> Do they really function identically?
> 
> The biggest difference between character arrays and the parens grouping
> is that the character array allows for a compact specification of ranges,
> eg:
> 
>   [a-z] is the same as (?:a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z)
> 
> As such, the hyphen has special meaning within the character set,
> this is often a source of surprise to some perl developers when the
> regex doesn't do what they expect.

Although it's useful to remember that if the first character in a 
character array is a hyphen, then it is matched literally, ie:

'foo-bar' =~ /[-a-z]/ ; # is true

tjc.