[Melbourne-pm] UTF-8 headaches
Toby Corkindale
toby.corkindale at rea-group.com
Wed Nov 14 23:08:45 PST 2007
Kat Grant wrote:
> Hi All
>
> We have a web front ended application, and (not unusually) some jobs
> that run server side in the background.
> My problem is that the same bit of code called from within the web
> server handles utf8 characters in just fine, but when called from a
> standalone script, turns them into rubbish.
>
> The code is literally identical, and it's doing my head in.
>
> I've tried running with all the various possible values of -C but
> nothing has helped.
>
> The code is pulling some UTF-8 data from a mysql database,
> constructing a MIME::Lite message and sending it. The messages come
> through fine when sent from within the webserver, but the characters
> are trashed when sent from a stand alone script.
>
> We use perl 5.8.8, MySQL 5, Apache 1.3, mod_perl on debian.
I remember a Perl talk a couple of years ago or so about using Unicode
and Perl and databases, and the conclusion could roughly be summed up
as: DBD::Pg (PostgreSQL) Just Works(tm) and MySQL required a bunch of
hoops to be jumped through.
I notice from Google that:
http://www.simplicidade.org/notes/archives/2005/12/utf8_and_dbdmys.html
So I wonder if that's what you're seeing.. You're getting raw characters
that add up to Unicode, but are not /marked/ as Unicode internally.
When printed via the web, your browser might be smart enough to pick up
that it's unicode and display it as such.. but on a terminal, the Perl
i/o layer or the terminal may be (mistakenly) escaping the bytes, or
trying to display them as iso-8859 instead, hence the garbage.
ie. the webserver may be the one at fault here, and your terminal is
correctly displaying garbage.
Can you try this out?
use utf8;
print (or log) "Data " . (utf8::is_utf8($string) ? 'is' : 'is not')
. " valid UTF8.\n";
If it comes back as NOT being utf8, try this:
use Encode qw/decode/;
my $upgraded = decode("utf8", $string);
print $upgraded;
-toby
More information about the Melbourne-pm
mailing list