utf8 and regular expressions.
Brendan Quinn
brendan at clueful.com.au
Wed Jan 30 23:32:54 CST 2002
Just tried it on
revision 5.0 version 7 subversion 2 patch 12378
(which is quite old, mine is dated October some time)
and got
Testing on abc%abc123
Found - ab
Found - ab
which is the output you expected right?
So it is a bug in 5.6 I spose.
Brendan.
Scott Penrose wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Since Perl Monks is down (well partially) I though I might post this
> here (melbourne-pm at pm.org)
>
> Here is a cute bit of code that has caused us hours of problems...
>
> #!/usr/bin/perl -w
>
> $test = $ARGV[0] || "abc%abc123";
>
> print "Testing on $test\n";
>
> use utf8;
> if ($test =~ /%([\dA-Fa-f]{2})/) {
> print "Found - $1\n";
> }
>
> no utf8;
> if ($test =~ /%([\dA-Fa-f]{2})/) {
> print "Found - $1\n";
> }
>
> The output of the above is....
>
> Testing on abc%abc123
> Found - abc123
> Found - ab
>
> Using perl 5.6.0 or perl 5.6.1 (I tried both).
>
> The problem, if you have not spotted it is that we have asked for {2}
> characters but get more if in utf8 mode.
>
> "{n} Match exactly n times" - man perlre
> We also tried {2,2}
> "{n,m} Match at least n but not more than m times" - man perlre
>
> Using (use) utf8 matches all things in the character class, no mater how
> long the string.
> This makes decoding a URL - HELL !
>
> I don't think I am doing anything wrong, but maybe someone can point out
> a problem with the above?
>
> Otherwise, it is a UTF8 Bug in perl re engine.
> Does anyone have perl 5.7 installed they could test it on?
>
> Scott
> - ---
> Scott Penrose
> Open source and Linux Developer
> http://linux.dd.com.au/
> scottp at dd.com.au
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.0.6 (Darwin)
> Comment: For info see http://www.gnupg.org
>
> iD8DBQE8WNJODCFCcmAm26YRAm5qAJ44PXprwN6jID3GtKixlENp//VqqQCeILnM
> su4MTqPzhL56scRdNBHCBuA=
> =DsBt
> -----END PGP SIGNATURE-----
>
>
>
--
Brendan Quinn brendan at clueful.com.au
Clueful Consulting Pty Ltd Phone +61 4 0076 0077
GPO Box 2747EE within Australia: 0400 760 077
Melbourne, Australia http://www.clueful.com.au/brendan/
More information about the Melbourne-pm
mailing list