On Thu, Mar 5, 2009 at 4:25 PM, Toby Corkindale <span dir="ltr"><<a href="mailto:toby.corkindale@strategicdata.com.au">toby.corkindale@strategicdata.com.au</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<div class="im"><a href="mailto:wigs@stirfried.org" target="_blank">wigs@stirfried.org</a> wrote:<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
On Thu, Mar 05, 2009 at 03:10:47PM +1100, Christopher Short wrote:<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
I saw this regex in someone else's code (to check validity of an id field)<br>
/^(\w|\@|\.|\-)+$/<br>
<br>
and decided that it wouldn't work properly, that what they'd meant was<br>
the character array<br>
/^[\w\@\.\-]+$/<br>
</blockquote>
<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Luckily I chucked it into Regex Coach and found both of them worked<br>
just as well.<br>
Thing is, when I look at<br>
/^(\w|\@|\.|\-)+$/<br>
</blockquote>
<br>
The '+' here means one or more repetion of the previous regex, not<br>
whatever it happens to match.<br>
<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Turns out the long-term perl developer who wrote that code always uses<br>
"pipe brackets" instead of character arrays.<br>
Do they really function identically?<br>
</blockquote>
<br>
The biggest difference between character arrays and the parens grouping<br>
is that the character array allows for a compact specification of ranges,<br>
eg:<br>
<br>
[a-z] is the same as (?:a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z)<br>
<br>
As such, the hyphen has special meaning within the character set,<br>
this is often a source of surprise to some perl developers when the<br>
regex doesn't do what they expect.<br>
</blockquote>
<br></div>
Although it's useful to remember that if the first character in a character array is a hyphen, then it is matched literally, ie:<br>
<br>
'foo-bar' =~ /[-a-z]/ ; # is true<br>
<br>
<br>
tjc.</blockquote><div><br>Actually, that regex is only matching the first character, ie the 'f', so would be true without the initial hypen. I think you mean that every character in the string is in the character class, so<br>
'foo-bar' =~ /[-a-z]{7}/ ; # is true<br><br>You can also put the hyphen last in the character classe, ie <br>'foo-bar' =~ /[a-z-]{7}/ ; # is also true<br><br>However, I thinks it's better to always put it first, because otherwise it's very easy to miss.<br>
<br><br></div></div>