[Chicago-talk] Malformed UTF-8 character
Andy_Bach at wiwb.uscourts.gov
Andy_Bach at wiwb.uscourts.gov
Thu Dec 14 15:15:04 PST 2006
I've got data w/ a x91 and x92 chars in it (which must be Excel curling
quotes) and trying to parse it I get a lot of:
Malformed UTF-8 character (unexpected continuation byte 0x91, with no
preceding start byte) in pattern match (m//) at
/opt/util/check_doc_table.pl line 155, <> line 1.
shown here:
;SetSardField(<91>bk_cur_mo_income<92>,<92>meantest<92>);
I can't find a way to get rid of the error or the hex codes, I've tried:
#while ( s/(.{1,5})([[:^ascii:]])(.{0,5})/${1}XX${2}/g ) {
if (
s/(\x91|\x92)/X/g
or
s/([[:^ascii:]])/X/g
) {
#while ( /([[:^ascii:]])/g ) {
#s/$1/X/;
#s/([[:^ascii:]])/X/;
warn( "Has non-ascii chars " . ord($1) . " replaced with 'XX'" )
if $debug > 3;
push(@errors, "Has non-ascii chars " . ord($1) . " replaced with
'XX'" );
}
various combos of the same - the non-ascii seems to work best but 1) it
complains about the s/// while doing that and 2) actually seems
non-deterministic - sometimes it replaces 0-4 of the hex chars,
Has non-ascii chars 146 replaced with 'XX'
Un-set Var entry XmeantestX
DPF SetSardField: Xbk_cur_mo_incomeX,XmeantestX
Entry: misc, metest7
Un-set Var entry ÂmeantestÂ
DPF SetSardField: Âbk_cur_mo_incomeÂ,ÂmeantestÂ
Has non-ascii chars 145 replaced with 'XX'
Un-set Var entry ÂmeantestÂ
DPF SetSardField: Xbk_cur_mo_incomeÂ,ÂmeantestÂ
that's 3 separate runs, using the up arrow to repeat the command. Yoikes.
a
a
Andy Bach
Systems Mangler
Internet: andy_bach at wiwb.uscourts.gov
VOICE: (608) 261-5738 FAX 264-5932
Seville Dar Daigo
Tousin Busses Inaro
Nojo Demistrux
Summit Cows In
Summit Dux
More information about the Chicago-talk
mailing list