[Za-pm] ascii high order character conversion

Tielman De Villiers tvilliers at Lastminute.com
Thu May 8 09:29:40 PDT 2008


It depends how your input data is formatted.

If it is UTF8 encoded, in double bytes, and if you have perl 5.6.1 and
above, use Unicode::Normalize:

use Unicode::Normalize qw/:all/;
...
$string =~ s/([\x80-\xFF])/substr(decompose($1),0,1)/eg;

If it's formatted by one or other Windows app, with special characters
such as the Euro symbol single byte encoded, then your input is probably
CP1252. I'm not sure about the available conversion modules.

Alternatively, just run your whole input file through the libiconv C
library:

iconv --from-code=ISO-8859-1 --to-code=UTF-8


--tielman











-----Original Message-----
From: za-pm-bounces+tvilliers=lastminute.com at pm.org
[mailto:za-pm-bounces+tvilliers=lastminute.com at pm.org] On Behalf Of Anne
Wainwright
Sent: 08 May 2008 17:20
To: za-pm at pm.org
Subject: [Za-pm] ascii high order character conversion

Hi.

I have data prepared on a dos programme that involves high order 
characters, like european letters with umlauts, cedillas, acute and
grave 
accents etc.

I have a dos utility that I wrote that converts all of these to plain 
unaccented characters, a simple replacement operation. The reason being 
that in moving the data to Windows it does not show them correctly and
this 
was the easiest way to go at the time. Now I am away from that route and

want to build this into my perl database conversion routine (convert
from 
proprietary to delimited).

Now I am wondering if there is an easier way in perl than doing a s///
for 
each of the characters used. I looked in the Perl Cookbook, and had a 
wander through the CPAN modules, but nothing struck me as specific for
the 
task in hand.

Not that lines of s/// wouldn't do the job, but I wondered if there was
a 
more concise way of programming this to convert either to the plain 
unaccented character or to the correct windows character.

[maybe I must study the "perlebcdic Considerations for running Perl on 
EBCDIC platforms" found on CPAN which looks like it might be a guide. 
suggests tr///   , will absorb this evening]

Had hoped for a ready module from CPAN, but see nothing.

Any ideas gratefully received on what must have been a common problem
some 
years back?


Regards
Anne
----
Anne Wainwright

_______________________________________________
Za-pm mailing list
Za-pm at pm.org
http://mail.pm.org/mailman/listinfo/za-pm


More information about the Za-pm mailing list