[Melbourne-pm] UTF-8 headaches

Toby Corkindale toby.corkindale at rea-group.com
Wed Nov 14 23:08:45 PST 2007


Kat Grant wrote:
> Hi All
> 
> We have a web front ended application, and (not unusually) some jobs  
> that run server side in the background.
> My problem is that the same bit of code called from within the web  
> server handles utf8 characters in just fine, but when called from a  
> standalone script, turns them into rubbish.

> 
> The code is literally identical, and it's doing my head in.
> 
> I've tried running with all the various possible values of -C but  
> nothing has helped.
> 
> The code is pulling some UTF-8 data from a mysql database,  
> constructing a MIME::Lite message and sending it. The messages come  
> through fine when sent from within the webserver, but the characters  
> are trashed when sent from a stand alone script.
> 
> We use perl 5.8.8, MySQL 5, Apache 1.3, mod_perl on debian.

I remember a Perl talk a couple of years ago or so about using Unicode
and Perl and databases, and the conclusion could roughly be summed up
as: DBD::Pg (PostgreSQL) Just Works(tm) and MySQL required a bunch of
hoops to be jumped through.

I notice from Google that:
http://www.simplicidade.org/notes/archives/2005/12/utf8_and_dbdmys.html

So I wonder if that's what you're seeing.. You're getting raw characters
that add up to Unicode, but are not /marked/ as Unicode internally.
When printed via the web, your browser might be smart enough to pick up
that it's unicode and display it as such.. but on a terminal, the Perl
i/o layer or the terminal may be (mistakenly) escaping the bytes, or
trying to display them as iso-8859 instead, hence the garbage.

ie. the webserver may be the one at fault here, and your terminal is
correctly displaying garbage.

Can you try this out?

use utf8;
print (or log) "Data " . (utf8::is_utf8($string) ? 'is' : 'is not')
    . " valid UTF8.\n";


If it comes back as NOT being utf8, try this:

use Encode qw/decode/;
my $upgraded = decode("utf8", $string);
print $upgraded;


-toby


More information about the Melbourne-pm mailing list