[tpm] Manipulating utf8 strings with Perl

Liam R E Quin liam at holoweb.net
Tue May 8 12:36:42 PDT 2012


On Tue, 2012-05-08 at 11:21 -0400, Antonio Sun wrote:

> cat myfile.utf8 | od -t x1 | head -3
> 0000000 3c 00 73 00 3a 00 45 00 6e 00 76 00 65 00 6c 00
> 0000020 6f 00 70 00 65 00 20 00 78 00 6d 00 6c 00 6e 00
> 0000040 73 00 3a 00 73 00 3d 00 22 00 68 00 74 00 74 00
> 
> I.e., each character takes 2 bytes.

That's not utf8 - utf8 doesn't have nul bytes in it unless you put them
there. It's utf-16.

Use iconv.

Liam


-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org www.advogato.org



More information about the toronto-pm mailing list