[tpm] Fwd: Perl and Unicode

Fri May 1 13:58:43 PDT 2009

On May 1, 2009, at 1:29 PM, Antonio Sun wrote:

> Hi Abram,
>
> When you talked about the difficulties that Perl has calculating  
> string
> lengths, I didn't quite understand your explanation because I didn't  
> catch
> the term that you used. Could you explain it in writing please?
>
> AFAIK, how Perl interprets string length depends on encoding, E.g.,
>
> use encoding utf8;
> print length("骆驼"); # 2, because there are 2 Chinese characters
> # However,
> no encoding;
> print length("骆驼"); # 6, the 2 Chinese characters take up 6 bytes.
>
> I.e., Perl has the capability to return whatever string length you  
> want. Do I miss anything?
>
> BTW,
>
> Anyone knows how to split an Unicode string into individual  
> characters? E.g., from "骆驼" to '骆' & '驼'?

There has to be a better way than this:

   DB<7> @chars = map { chr } unpack('U*', "骆驼")

   DB<8> x @chars
0  '\x{9A86}'
1  '\x{9A7C}'
   DB<9> print "@chars"
骆 驼
   DB<10>

Mike

> Thanks
>
> Antonio
>
>
> _______________________________________________
> toronto-pm mailing list
> toronto-pm at pm.org
> http://mail.pm.org/mailman/listinfo/toronto-pm

-- 

Mike Stok <mike at stok.ca>
http://www.stok.ca/~mike/

The "`Stok' disclaimers" apply.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/toronto-pm/attachments/20090501/f6a80092/attachment.html>