Thanks Murray and Marco.<br><br>Murray, I'd already checked out that very same link, but thanks. It is a handy one.<br>But I'd never used Devel::Peek before. I'll definitely be playing with that one.<br><br>Marco, Thanks for the detailed explanation of things. Really, it helped me understand a few things further in depth<br>
<br>You'll both love what it turned out to be...... I was double-encoding the strings.<br>Since I receive data in many formats, I convert them all into UTF8, and then send the data through a socket to another server.<br>
Once at the other end I was trying to encode them again.... sadly this made them look a LOT like they would had they never been cleaned up in the first place. Hence my thinking that the encoding-correction wasn't working.<br>
I had forgotten that I had tried to clean up the encoding at both ends...<br>UGH....<br><br>Many thanks for your help.<br>At least I DID still learn something.<br><br>--Alex<br><br><br><div class="gmail_quote">Marco Fontani <span dir="ltr"><<a href="mailto:fontani@gmail.com" target="_blank">fontani@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
<br>
So send() expects bytes. If you're giving it characters, it will not<br>
do the right thing.<br>
To "get bytes from a string of utf8 characters", you can use my<br>
utf8::encode($str) if you know $str is utf8.<br>
<br>
The other side will then receive bytes and if it needs characters it<br>
will have to my utf8::decode($received);<br>
<br>
>From perldoc utf8:<br>
# utf8::encode($string); </blockquote><div> </div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"># "\x{100}" becomes "\xc4\x80" </blockquote>
<div> </div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"># that is, utf8 string to bytes<br>
# utf8::decode($string); # "\xc4\x80" becomes "\x{100}" # that is,<br>
bytes to utf8 string<br>
<br>
Try the above, and let me know how it goes ;)<br>
<br>
Unicode is easy*!!!<br>
<br>
Just my 2 cents,<br>
-marco-<br>
<br>
* to get wrong<br>
</blockquote></div><br>