[Melbourne-pm] basic regex - char array vs "pipe brackets"
Toby Corkindale
toby.corkindale at strategicdata.com.au
Thu Mar 5 15:30:33 PST 2009
Myf White wrote:
> On Thu, Mar 5, 2009 at 4:25 PM, Toby Corkindale
> <toby.corkindale at strategicdata.com.au
> <mailto:toby.corkindale at strategicdata.com.au>> wrote:
>
> wigs at stirfried.org <mailto:wigs at stirfried.org> wrote:
>
> On Thu, Mar 05, 2009 at 03:10:47PM +1100, Christopher Short wrote:
>
> I saw this regex in someone else's code (to check validity
> of an id field)
> /^(\w|\@|\.|\-)+$/
>
> and decided that it wouldn't work properly, that what they'd
> meant was
> the character array
> /^[\w\@\.\-]+$/
>
>
> Luckily I chucked it into Regex Coach and found both of them
> worked
> just as well.
> Thing is, when I look at
> /^(\w|\@|\.|\-)+$/
>
>
> The '+' here means one or more repetion of the previous regex, not
> whatever it happens to match.
>
> Turns out the long-term perl developer who wrote that code
> always uses
> "pipe brackets" instead of character arrays.
> Do they really function identically?
>
>
> The biggest difference between character arrays and the parens
> grouping
> is that the character array allows for a compact specification
> of ranges,
> eg:
>
> [a-z] is the same as
> (?:a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z)
>
> As such, the hyphen has special meaning within the character set,
> this is often a source of surprise to some perl developers when the
> regex doesn't do what they expect.
>
>
> Although it's useful to remember that if the first character in a
> character array is a hyphen, then it is matched literally, ie:
>
> 'foo-bar' =~ /[-a-z]/ ; # is true
>
>
> tjc.
>
>
> Actually, that regex is only matching the first character, ie the 'f',
> so would be true without the initial hypen.
Argh, yes! So much for my quick example.
I think I meant to write /^[-a-z]+$/ but your example with {7} is clearer.
Thanks for spotting it.
> I think you mean that every
> character in the string is in the character class, so
> 'foo-bar' =~ /[-a-z]{7}/ ; # is true
>
> You can also put the hyphen last in the character classe, ie
> 'foo-bar' =~ /[a-z-]{7}/ ; # is also true
>
> However, I thinks it's better to always put it first, because otherwise
> it's very easy to miss.
More information about the Melbourne-pm
mailing list