[tpm] Dumb regex question

Liam R E Quin liam at holoweb.net
Tue Aug 21 09:45:02 PDT 2007


On Tue, 2007-08-21 at 12:15 -0400, Madison Kelly wrote:
> For the life of me, I can't seem to get a simple regex working...
> 
> All I want is to be able to match a word-character string that may have 
> a hyphen in it.
> 
> So:
> 
> mizu-bu		# should match
> alteeve		# should match
> m!zu-bu		# should not match
> a|teeve		# should not match

Two things to note here
(1) hypen is special in a character class, e.g. [a-z]
(2) you need to anchor the match, since a!b could be two matching
    words separated by a "!"

Perl defines \w for a word character, so we can match that or
a hyphen with (\w|-)
and then, 
    ^(\w|-)+$
will do what you want I think.

You can also use a character class as long as the hyphen
is at the end:
    ^[\w-]+$/

If Perl's definition of a word character (alphanumeric
plus _) isn't what you want, you can use
    ^[a-zA-Z0-9-]+$
for example, or you can use the bizarre Posix syntax:
    ^[[:alnum:]-]+$
You'll want "use locale" for that to be sensible.
If you do, "use utf8" you can also use the the Unicode properties:
    ^[\p{Letter}\l{Number}-]+$
and this will allow, for example, Hindi words too.

Hope this helps.

Liam (now living in Prince Edward County)

-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org www.advogato.org



More information about the toronto-pm mailing list