[Chicago-talk] Malformed UTF-8 character
Andy_Bach at wiwb.uscourts.gov
Andy_Bach at wiwb.uscourts.gov
Fri Dec 15 09:43:29 PST 2006
> > I've got data w/ a x91 and x92 chars in it (which must be Excel
curling
> quotes) and trying to parse it I get a lot of:
> Malformed UTF-8 character (unexpected continuation byte 0x91, with no
> preceding start byte) in pattern match (m//) at
> /opt/util/check_doc_table.pl line 155, <> line 1.
> Looks like you might be coming up against windows-1252 and perl is
thinking its Unicode. Try seeing if the utf8 flag is set using the
Encode module. If it is, you might consider turning off the utf8 flag
and [d]encoding to the proper format for your work.
Its a cgi app (on linux) and the data is unknowingly winx/dos text, linux
text or even cutnpaste from html display. The issue here is similar to
what I'm running into - we have an excel spreadsheet w/ an 'export' macro
that is supposed to produce a ".txt" file that folks can then upload to
their linux box/db. The spreadsheet is supposed to help non-linux folks
have an easy way of editing - they edit, export and upload rather than
work on the linux box. What happens is these chars get left in, unnoticed,
and the uploaded info fails in unexpected ways.
I wasn't doing any decoding, but:
my $utf_line = eval " decode(\'ISO-8859-1\', \$line, Encode::FB_WARN ) ";
#my $utf_line = decode('ISO-8859-1', $line , Encode::FB_WARN);
in various guises doesn't seem to help. W/o the eval
Wide character in subroutine entry at
/usr/lib/perl5/5.8.0/i386-linux-thread-multi/Encode.pm line 154, <> line
160.
(I do have an older 'Encode' but can't upgrade). W/ the eval, and, if I'm
reading the docs right, w/ decode - I still can't get rid of the issue.
a
Andy Bach
Systems Mangler
Internet: andy_bach at wiwb.uscourts.gov
VOICE: (608) 261-5738 FAX 264-5932
Seville Dar Daigo
Tousin Busses Inaro
Nojo Demistrux
Summit Cows In
Summit Dux
More information about the Chicago-talk
mailing list