[Melbourne-pm] UTF-8 headaches

Kat Grant crashkat at gmail.com
Thu Nov 15 03:52:44 PST 2007


Hey Toby
I'll give it a shot in the morning.
Tnx :)

K

On 15/11/2007, at 6:08 PM, Toby Corkindale wrote:

> Kat Grant wrote:
>> Hi All
>>
>> We have a web front ended application, and (not unusually) some jobs
>> that run server side in the background.
>> My problem is that the same bit of code called from within the web
>> server handles utf8 characters in just fine, but when called from a
>> standalone script, turns them into rubbish.
>
>>
>> The code is literally identical, and it's doing my head in.
>>
>> I've tried running with all the various possible values of -C but
>> nothing has helped.
>>
>> The code is pulling some UTF-8 data from a mysql database,
>> constructing a MIME::Lite message and sending it. The messages come
>> through fine when sent from within the webserver, but the characters
>> are trashed when sent from a stand alone script.
>>
>> We use perl 5.8.8, MySQL 5, Apache 1.3, mod_perl on debian.
>
> I remember a Perl talk a couple of years ago or so about using Unicode
> and Perl and databases, and the conclusion could roughly be summed up
> as: DBD::Pg (PostgreSQL) Just Works(tm) and MySQL required a bunch of
> hoops to be jumped through.
>
> I notice from Google that:
> http://www.simplicidade.org/notes/archives/2005/12/ 
> utf8_and_dbdmys.html
>
> So I wonder if that's what you're seeing.. You're getting raw  
> characters
> that add up to Unicode, but are not /marked/ as Unicode internally.
> When printed via the web, your browser might be smart enough to  
> pick up
> that it's unicode and display it as such.. but on a terminal, the Perl
> i/o layer or the terminal may be (mistakenly) escaping the bytes, or
> trying to display them as iso-8859 instead, hence the garbage.
>
> ie. the webserver may be the one at fault here, and your terminal is
> correctly displaying garbage.
>
> Can you try this out?
>
> use utf8;
> print (or log) "Data " . (utf8::is_utf8($string) ? 'is' : 'is not')
>     . " valid UTF8.\n";
>
>
> If it comes back as NOT being utf8, try this:
>
> use Encode qw/decode/;
> my $upgraded = decode("utf8", $string);
> print $upgraded;
>
>
> -toby



More information about the Melbourne-pm mailing list