[tpm] PostgreSQL INSERT/UTF-8 problem

Madison Kelly linux at alteeve.com
Wed Jul 9 04:52:02 PDT 2008


Thanks for the reply, Rob!

   I will play with Encode as soon as I get into the office. As for the 
source encoding, let me follow that up in my reply to Cees.

Madi

Rob Janes wrote:
> oops, first reply went to Madison alone ...
> 
> I think this is better ...
> 
> use Encode qw(from_to decode);
> 
> my $data = "Résidence";
> from_to($data, "iso-8859-1", "utf8"); ## assuming Résidence is encoded 
> in 8859-1
> or
> my $data = decode("iso-8859-1", "Résidence");
> 
> both of these will create a utf8 string from Résidence.  However, 
> depending on the original encoding of Résidence, what's stored in the 
> database may or may not be what you want.
> 
> In other words, the lack of an error message is not indicative of it 
> working.
> 
> -rob
> 
> On Tue, Jul 8, 2008 at 4:41 PM, Rob Janes <janes.rob at gmail.com 
> <mailto:janes.rob at gmail.com>> wrote:
> 
>     methinks your perl script is encoded in iso-8859-1, or a windows
>     code page.  just cause you can see the accent doesn't mean it's
>     right.  set your editor to utf-8.  or use character conversions.
> 
>     use utf8;  ## not sure about this, is pragma
>     $blob = utf8::encode( 'Résidence' )
> 
>     or
>     use Encode;
>     $blob = encode("utf8", 'Résidence' );
> 
>     Encode doesn't make any statements about the encoding of your
>     script, it might be the better way.
> 
>     iconv is another possibility.
> 
>     look at the man page for charnames.
> 
>     use charnames ":full";
>     print "R\N{LATIN SMALL LETTER E WITH ACUTE}sidence\n";
> 
>     -rob
> 
> 
>     On Tue, Jul 8, 2008 at 3:29 PM, Madison Kelly <linux at alteeve.com
>     <mailto:linux at alteeve.com>> wrote:
> 
>         Hi all, second question of the day!
> 
>          I've got a problem INSERTing a value into my DB. It's a French
>         character 'é', and my DB is set to UTF8, but the error is:
> 
>         INSERT INTO customer_data (cd_cust_id, cd_variable, cd_value,
>         added_user, added_date, modified_user, modified_date) VALUES (1,
>         'CustServiceTypeDisplay_F', 'Résidence', 1, now(), 1, now());
> 
>         DBD::Pg::db do failed: ERROR:  invalid byte sequence for
>         encoding "UTF8": 0xe97369
>         HINT:  This error can also happen if the byte sequence does not
>         match the encoding expected by the server, which is controlled
>         by "client_encoding".
> 
>          When I manually run the INSERT, it works, so I know the problem
>         is in perl somewhere. Now then, I setup my script with this:
> 
>         # Setup for UTF-8 mode.
>         binmode STDOUT, ":utf8:";
>         $ENV{'PERL_UNICODE'}=1;
> 
>          When I create my PgSQL connection, I use:
> 
>         $dbh=DBI->connect($db_connect_string, $$conf{db}{user},
>         $$conf{db}{pass},
>         {
>                RaiseError => 1,
>                AutoCommit => 1,
>                pg_enable_utf8 => 1
>         }
>         ) or die ...;
> 
>          I push a pile of queries into an array (referenced) and run
>         them like this:
> 
>         # Sanity checks stripped for the email
>         $dbh->begin_work;
>         foreach my $query (@{$sql})
>         {
>                print "Query: [$query]\n";
>                $dbh->do($query) or $error.=$DBI::errstr.", ";
>         }
>         $dbh->commit;
> 
>          Lastly, my database itself is set to UTF8:
> 
>         SET client_encoding = 'UTF8';
> 
>          I've tried knocking out the 'pg_enable_utf8 => 1' line in case
>         I was dealing with double-encoding, but that didn't help.
> 
>          Any tips/ideas?
> 
>         Thanks!
> 
>         Madi




More information about the toronto-pm mailing list