utf8 and regular expressions.

Brendan Quinn brendan at clueful.com.au
Wed Jan 30 23:32:54 CST 2002


Just tried it on
revision 5.0 version 7 subversion 2 patch 12378
(which is quite old, mine is dated October some time)

and got

Testing on abc%abc123
Found - ab
Found - ab

which is the output you expected right?

So it is a bug in 5.6 I spose.

Brendan.

Scott Penrose wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Since Perl Monks is down (well partially) I though I might post this 
> here (melbourne-pm at pm.org)
> 
> Here is a cute bit of code that has caused us hours of problems...
> 
> #!/usr/bin/perl -w
> 
> $test = $ARGV[0] || "abc%abc123";
> 
> print "Testing on $test\n";
> 
> use utf8;
> if ($test =~ /%([\dA-Fa-f]{2})/) {
>         print "Found - $1\n";
> }
> 
> no utf8;
> if ($test =~ /%([\dA-Fa-f]{2})/) {
>         print "Found - $1\n";
> }
> 
> The output of the above is....
> 
>     Testing on abc%abc123
>     Found - abc123
>     Found - ab
> 
> Using perl 5.6.0 or perl 5.6.1 (I tried both).
> 
> The problem, if you have not spotted it is that we have asked for {2} 
> characters but get more if in utf8 mode.
> 
>     "{n}    Match exactly n times" - man perlre
> We also tried {2,2}
>     "{n,m}  Match at least n but not more than m times" - man perlre
> 
> Using (use) utf8 matches all things in the character class, no mater how 
> long the string.
> This makes decoding a URL - HELL !
> 
> I don't think I am doing anything wrong, but maybe someone can point out 
> a problem with the above?
> 
> Otherwise, it is a UTF8 Bug in perl re engine.
> Does anyone have perl 5.7 installed they could test it on?
> 
> Scott
> - ---
> Scott Penrose
> Open source and Linux Developer
> http://linux.dd.com.au/
> scottp at dd.com.au
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.0.6 (Darwin)
> Comment: For info see http://www.gnupg.org
> 
> iD8DBQE8WNJODCFCcmAm26YRAm5qAJ44PXprwN6jID3GtKixlENp//VqqQCeILnM
> su4MTqPzhL56scRdNBHCBuA=
> =DsBt
> -----END PGP SIGNATURE-----
> 
> 
> 



-- 
Brendan Quinn                                   brendan at clueful.com.au
Clueful Consulting Pty Ltd                       Phone +61 4 0076 0077
GPO Box 2747EE                          within Australia: 0400 760 077
Melbourne, Australia                http://www.clueful.com.au/brendan/




More information about the Melbourne-pm mailing list