[Melbourne-pm] UTF-8 headaches

Toby Corkindale toby.corkindale at rea-group.com
Wed Nov 14 23:08:45 PST 2007

Kat Grant wrote:
> Hi All
> We have a web front ended application, and (not unusually) some jobs  
> that run server side in the background.
> My problem is that the same bit of code called from within the web  
> server handles utf8 characters in just fine, but when called from a  
> standalone script, turns them into rubbish.

> The code is literally identical, and it's doing my head in.
> I've tried running with all the various possible values of -C but  
> nothing has helped.
> The code is pulling some UTF-8 data from a mysql database,  
> constructing a MIME::Lite message and sending it. The messages come  
> through fine when sent from within the webserver, but the characters  
> are trashed when sent from a stand alone script.
> We use perl 5.8.8, MySQL 5, Apache 1.3, mod_perl on debian.

I remember a Perl talk a couple of years ago or so about using Unicode
and Perl and databases, and the conclusion could roughly be summed up
as: DBD::Pg (PostgreSQL) Just Works(tm) and MySQL required a bunch of
hoops to be jumped through.

I notice from Google that:

So I wonder if that's what you're seeing.. You're getting raw characters
that add up to Unicode, but are not /marked/ as Unicode internally.
When printed via the web, your browser might be smart enough to pick up
that it's unicode and display it as such.. but on a terminal, the Perl
i/o layer or the terminal may be (mistakenly) escaping the bytes, or
trying to display them as iso-8859 instead, hence the garbage.

ie. the webserver may be the one at fault here, and your terminal is
correctly displaying garbage.

Can you try this out?

use utf8;
print (or log) "Data " . (utf8::is_utf8($string) ? 'is' : 'is not')
    . " valid UTF8.\n";

If it comes back as NOT being utf8, try this:

use Encode qw/decode/;
my $upgraded = decode("utf8", $string);
print $upgraded;


More information about the Melbourne-pm mailing list