On Thu, Mar 5, 2009 at 4:25 PM, Toby Corkindale <span dir="ltr">&lt;<a href="mailto:toby.corkindale@strategicdata.com.au">toby.corkindale@strategicdata.com.au</a>&gt;</span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<div class="im"><a href="mailto:wigs@stirfried.org" target="_blank">wigs@stirfried.org</a> wrote:<br>

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

On Thu, Mar 05, 2009 at 03:10:47PM +1100, Christopher Short wrote:<br>

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

I saw this regex in someone else&#39;s code (to check validity of an id field)<br>

/^(\w|\@|\.|\-)+$/<br>

<br>

and decided that it wouldn&#39;t work properly, that what they&#39;d meant was<br>

the character array<br>

/^[\w\@\.\-]+$/<br>

</blockquote>

<br>

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

Luckily I chucked it into Regex Coach and found both of them worked<br>

just as well.<br>

Thing is, when I look at<br>

/^(\w|\@|\.|\-)+$/<br>

</blockquote>

<br>

The &#39;+&#39; here means one or more repetion of the previous regex, not<br>

whatever it happens to match.<br>

<br>

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

Turns out the long-term perl developer who wrote that code always uses<br>

&quot;pipe brackets&quot; instead of character arrays.<br>

Do they really function identically?<br>

</blockquote>

<br>

The biggest difference between character arrays and the parens grouping<br>

is that the character array allows for a compact specification of ranges,<br>

eg:<br>

<br>

  [a-z] is the same as (?:a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z)<br>

<br>

As such, the hyphen has special meaning within the character set,<br>

this is often a source of surprise to some perl developers when the<br>

regex doesn&#39;t do what they expect.<br>

</blockquote>

<br></div>

Although it&#39;s useful to remember that if the first character in a character array is a hyphen, then it is matched literally, ie:<br>

<br>

&#39;foo-bar&#39; =~ /[-a-z]/ ; # is true<br>

<br>

<br>

tjc.</blockquote><div><br>Actually, that regex is only matching the first character, ie the &#39;f&#39;, so would be true without the initial hypen. I think you mean that every character in the string is in the character class, so<br>

&#39;foo-bar&#39; =~ /[-a-z]{7}/ ; # is true<br><br>You can also put the hyphen last in the character classe, ie  <br>&#39;foo-bar&#39; =~ /[a-z-]{7}/ ; # is also true<br><br>However, I thinks it&#39;s better to always put it first, because otherwise it&#39;s very easy to miss.<br>

<br><br></div></div>