[Melbourne-pm] \N in regular expressions and unicode
Jacinta Richardson
jarich at perltraining.com.au
Sun Jan 9 23:21:09 PST 2011
G'day folk,
In 5.12.0(ish) a new meta-character was added for Perl's regular expressions.
\N matches anything that isn't a newline, and it was added so that when you use
the /s switch (so that . also matches newlines), you still have something other
than [^\n] to give you the previous . behaviour.
This follows the existing mnemonics:
\s - any whitespace
\S - any non-whitespace
\w - any word character
\W - any non-word character
and thus:
\n - a newline
\N - not a newline.
However \N{some unicode name} *also* allows you to specify a unicode character
by name. This leaves us with a problem.
The following two snippets are equivalent:
my ($five_letters) = /(\w{5})/;
my $five = 5;
my ($five_letters) = /(\w{$five})/;
Although this is contrived, I can imagine situations where you might not know in
advance how many characters you wished to match.
Likewise, the following two snippets are equivalent:
my ($five_any) = /(.{5})/s;
my $five = 5;
my ($five_any) = /(.{$five})/s;
as you would expect.
We can match non-newlines in 5.12.2 with:
use v5.12.2;
my ($five_non_newlines) = /(\N{5})/;
We can match unicode characters with:
use utf8;
use charnames ':full';
my ($symbol) = /(\N{AC CURRENT})/;
What would you expect for the following though?
use strict;
use warnings;
use utf8;
use charnames ':full';
use v5.12.2;
my $var1 = 5;
my $var2 = "AC CURRENT";
say $1 if /(\N{$var1})/;
say $1 if /(\N{$var2})/;
All the best,
Jacinta
PS: I know what we do get:
Unknown charname '$var1' at utf8.pl line 10
Deprecated character(s) in \N{...} starting at '$var1' at utf8.pl line 10
...
--
("`-''-/").___..--''"`-._ | Jacinta Richardson |
`6_ 6 ) `-. ( ).`-.__.`) | Perl Training Australia |
(_Y_.)' ._ ) `._ `. ``-..-' | +61 3 9354 6001 |
_..`--'_..-_/ /--'_.' ,' | contact at perltraining.com.au |
(il),-'' (li),' ((!.-' | www.perltraining.com.au |
More information about the Melbourne-pm
mailing list