[Melbourne-pm] UTF-8 headaches
toby.corkindale at rea-group.com
Wed Nov 14 23:08:45 PST 2007
Kat Grant wrote:
> Hi All
> We have a web front ended application, and (not unusually) some jobs
> that run server side in the background.
> My problem is that the same bit of code called from within the web
> server handles utf8 characters in just fine, but when called from a
> standalone script, turns them into rubbish.
> The code is literally identical, and it's doing my head in.
> I've tried running with all the various possible values of -C but
> nothing has helped.
> The code is pulling some UTF-8 data from a mysql database,
> constructing a MIME::Lite message and sending it. The messages come
> through fine when sent from within the webserver, but the characters
> are trashed when sent from a stand alone script.
> We use perl 5.8.8, MySQL 5, Apache 1.3, mod_perl on debian.
I remember a Perl talk a couple of years ago or so about using Unicode
and Perl and databases, and the conclusion could roughly be summed up
as: DBD::Pg (PostgreSQL) Just Works(tm) and MySQL required a bunch of
hoops to be jumped through.
I notice from Google that:
So I wonder if that's what you're seeing.. You're getting raw characters
that add up to Unicode, but are not /marked/ as Unicode internally.
When printed via the web, your browser might be smart enough to pick up
that it's unicode and display it as such.. but on a terminal, the Perl
i/o layer or the terminal may be (mistakenly) escaping the bytes, or
trying to display them as iso-8859 instead, hence the garbage.
ie. the webserver may be the one at fault here, and your terminal is
correctly displaying garbage.
Can you try this out?
print (or log) "Data " . (utf8::is_utf8($string) ? 'is' : 'is not')
. " valid UTF8.\n";
If it comes back as NOT being utf8, try this:
use Encode qw/decode/;
my $upgraded = decode("utf8", $string);
More information about the Melbourne-pm