<font face="courier new,monospace"><br></font><br><div class="gmail_quote">On Mon, Oct 15, 2012 at 3:23 PM, Stuart Watt <span dir="ltr"><<a href="mailto:stuart@morungos.com" target="_blank">stuart@morungos.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div>It's not an uncommon problem, but it's a messy one. And it's basically an application decision. </div>
<div><br></div><div>The module you need is Encode, and what you probably need is</div><div><br></div><div>my $encoded = Encode::encode('utf8', $utf8_string);</div></div></blockquote><div><span style="font-family:courier new,monospace"><br>
Yes, I've been reading all this stuff, but it still doesn't make sense to me<br>(as I see also to many others... <a href="http://www.perlmonks.org/?node_id=906373" target="_blank">http://www.perlmonks.org/?node_id=906373</a>)<br>
<br>All of the responses I've read so far, assume you are processing textual strings<br>and not octet strings.<br><br>Reading the<span style="font-weight:normal"><font>'perlunitut'</font></span>, I see:<br><br>
<b> </b> Encoding (as a verb) is the conversion from <i>text</i> to <i>binary</i>. To encode,
you have<br> to supply the target encoding, for example <span><font size="-1">iso-8859-1</font></span> or <span><font size="-1">UTF-8</font></span>.
Some encodings,<br> like the <span><font size="-1">iso-8859</font></span> ("latin") range, do not support the full
Unicode standard;<br> characters that can't be represented are lost in the
conversion.<br><br>The scary part is that I don't really know what the original format is.<br>(In this case it happens to be text that contains MS Windows file names that<br>is causing me grief.)<br><br>... another day passes since I wrote the above part ...<br>
<br>Trying to patch my original program made me wander in my attempts to fix the<br>problem, so once I created a simple test program, I discovered that indeed,<br> utf8::encode($msg);<br>would address the problem [ just as described :-) ].<br>
Then once I found the appropriate spot in my code, it had 'compensated' for the issue.<br><br></span><span style="font-family:courier new,monospace"><font>[ But what happens if I don't feed it a text string (wide or narrow)<br>
but my octet string instead?<font> </font>What comes out the other end?</font><br><br> I guess it passes it through transparently<br> (knowing that it no longer contains a UTF string)<br>]</span><br><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="word-wrap:break-word"><div>which translates the string that's in UTF8 and which you can't print, into a set of bytes in UTF8, which you can. That stops the print error. However, this is for printing or writing over a network connection, and you might need a different encoding depending on your protocol. The Encode module can do most any encoding you like or need, and many that seem ridiculous. </div>
<div><br></div><div>In the case of UTF8, and only because internally Perl uses UTF8, that sets a special flag that effectively stops Perl from giving wide character errors. But this is highly confusing special behaviour, and it's often worth testing Perl with non-UTF8 data printing/communications to flush out these issues. </div>
<div><br></div><div>The problems are worse if you don't know what your strings are to begin with. It's best to help your app by making everything UTF8 (internally) as soon as possible, assuming it isn't already. There is no way to tell. reliably, whether a piece of random data really is UTF8 text as that's really down to how it is supposed to be interpreted. </div>
<div><br></div><div>--S</div><br><div><div><div><div>On 2012-10-15, at 1:59 PM, Fulko Hew wrote:</div><br></div></div><blockquote type="cite"><div><div><font face="courier new,monospace">I have a problem </font><font face="courier new,monospace"><font face="courier new,monospace">(so what else is new!) </font>that I haven't yet found a solution to ...<br>
<br>In my app, I receive strings, massage them, and 'push_write" them to an </font><font face="courier new,monospace"><font face="courier new,monospace">AnyEvent socket.<br>
<br></font></font><font face="courier new,monospace">Occasionaly, my app receives a unicoded string...<br>so when the write happens, </font><font face="courier new,monospace">Perl (inside the AnyEvent module)<br>dies with the error:<br>
<br> Wide character in subroutine entry at ...<br><br>What I haven't figured out yet is, how to coerce the character string into<br>an octet string (for the rest of its life, ie. in subsequent modules)<br>so the warning/dying goes away.<br>
<br>TIA<br>Fulko</font><br>
</div></div></blockquote></div></div></blockquote></div>