[Edinburgh-pm] Send UTF8 info across Socket connection

Marco Fontani fontani at gmail.com
Thu Sep 9 05:42:48 PDT 2010


[did not copy the list in, apologies]

> I have perl scripts that are reading in/creating string content in UTF8
> encoding, and then sending that data across a socket connection (using
> IO::Socket::INET), in a $socket->send($string,0) format.

IO::Socket::INET uses the usual "send()" behind the scenes.

Perldoc explains some intricacies between sending bytes or characters:
http://perldoc.perl.org/functions/send.html; Some good stuff is also
on http://perldoc.perl.org/perlunicode.html

By default all sockets operate on bytes, but for example if the socket
has been changed using binmode() to operate with the :encoding(utf8)
I/O layer (see open, or the open pragma, open), the I/O will operate
on UTF-8 encoded Unicode characters, not bytes.

So send() expects bytes. If you're giving it characters, it will not
do the right thing.
To "get bytes from a string of utf8 characters", you can use my
utf8::encode($str) if you know $str is utf8.

The other side will then receive bytes and if it needs characters it
will have to my utf8::decode($received);

>From perldoc utf8:
# utf8::encode($string); # "\x{100}" becomes "\xc4\x80" # that is,
utf8 string to bytes
# utf8::decode($string); # "\xc4\x80" becomes "\x{100}" # that is,
bytes to utf8 string

Try the above, and let me know how it goes ;)

Unicode is easy*!!!

Just my 2 cents,
-marco-

* to get wrong


More information about the Edinburgh-pm mailing list