[Chicago-talk] Malformed UTF-8 character

Andy_Bach at wiwb.uscourts.gov Andy_Bach at wiwb.uscourts.gov
Thu Dec 14 15:15:04 PST 2006


I've got data w/ a  x91 and x92 chars in it (which must be Excel curling 
quotes) and trying to parse it I get a lot of:
Malformed UTF-8 character (unexpected continuation byte 0x91, with no 
preceding start byte) in pattern match (m//) at 
/opt/util/check_doc_table.pl line 155, <> line 1.

shown here:
;SetSardField(<91>bk_cur_mo_income<92>,<92>meantest<92>);

I can't find a  way to get rid of the error or the hex codes, I've tried:
  #while ( s/(.{1,5})([[:^ascii:]])(.{0,5})/${1}XX${2}/g ) {
  if (
      s/(\x91|\x92)/X/g
  or
      s/([[:^ascii:]])/X/g
      ) {
  #while ( /([[:^ascii:]])/g ) {
     #s/$1/X/;
     #s/([[:^ascii:]])/X/;
     warn( "Has non-ascii chars " . ord($1) . " replaced with 'XX'" )
       if $debug > 3;
     push(@errors, "Has non-ascii chars " . ord($1) . " replaced with 
'XX'" );
  }


various combos of the same - the non-ascii seems to work best but 1) it 
complains about the s/// while doing that and 2) actually seems 
non-deterministic - sometimes it replaces 0-4 of the hex chars, 

 Has non-ascii chars 146 replaced with 'XX'
  Un-set Var entry XmeantestX
  DPF SetSardField: Xbk_cur_mo_incomeX,XmeantestX

Entry: misc, metest7
  Un-set Var entry ÂmeantestÂ
  DPF SetSardField: Âbk_cur_mo_incomeÂ,ÂmeantestÂ

  Has non-ascii chars 145 replaced with 'XX'
  Un-set Var entry ÂmeantestÂ
  DPF SetSardField: Xbk_cur_mo_incomeÂ,ÂmeantestÂ

that's 3 separate runs, using the up arrow to repeat the command.  Yoikes.

a


a

Andy Bach
Systems Mangler
Internet: andy_bach at wiwb.uscourts.gov
VOICE: (608) 261-5738  FAX 264-5932

Seville Dar Daigo
Tousin Busses Inaro
Nojo Demistrux
Summit Cows In
Summit Dux 


More information about the Chicago-talk mailing list