[Athens-pm] iso-8859-7 to nummeric format conversion
Philip Lees
pjlees at ics.forth.gr
Fri Oct 31 07:10:17 CST 2003
On 31 Oct 2003 at 14:01, Mark Pors wrote:
> Not very pretty, but will work
Yikes! That's horrid. Mixing program and data like that is not a good
idea.
Try the attached version (untested - I don't have anything to try it
on). Note that this way you can save the conversion table in a
separate file if you like and read it into the hash at run time.
Philip
--
Philip Lees
ICS-FORTH, Science and Technology Park of Crete
Vassilika Vouton, P.O. Box 1385, GR 711 10 Heraklion, Crete, GREECE
tel.: +30-2810-391680, fax: +30-2810-391601, e-mail:
pjlees at ics.forth.gr
'The aim of high technology should be to simplify, not complicate' -
Hans Christian
von Baeyer
-------------- next part --------------
#!/usr/bin/perl
use strict;
use warnings;
my %UTF = ( '\xCE'=>'xCE\x9E', # GREEK CAPITAL LETTER XI
'\xC2'=>'xCE\x92', # GREEK CAPITAL LETTER BETA
'\xCA'=>'xCE\x9A', # GREEK CAPITAL LETTER KAPPA
'\xBC'=>'xCE\x8C', # GREEK CAPITAL LETTER OMICRON WITH TONOS
'\xBD'=>'xC2\xBD', # VULGAR FRACTION ONE HALF
'\xA0'=>'xC2\xA0', # NO-BREAK SPACE
'\xA1'=>'xCA\xBD', # MODIFIER LETTER REVERSED COMMA
'\xA2'=>'xCA\xBC', # MODIFIER LETTER APOSTROPHE
'\xA3'=>'xC2\xA3', # POUND SIGN
'\xA6'=>'xC2\xA6', # BROKEN BAR
'\xA7'=>'xC2\xA7', # SECTION SIGN
'\xA8'=>'xC2\xA8', # DIAERESIS
'\xA9'=>'xC2\xA9', # COPYRIGHT SIGN
'\xAB'=>'xC2\xAB', # LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xAC'=>'xC2\xAC', # NOT SIGN
'\xAD'=>'xC2\xAD', # SOFT HYPHEN
'\xB2'=>'xC2\xB2', # SUPERSCRIPT TWO
'\xE2'=>'xCE\xB2', # GREEK SMALL LETTER BETA
'\xAF'=>'xE2\x40\x55', # HORIZONTAL BAR
'\xB0'=>'xC2\xB0', # DEGREE SIGN
'\xB1'=>'xC2\xB1', # PLUS-MINUS SIGN
'\xB3'=>'xC2\xB3', # SUPERSCRIPT THREE
'\xB4'=>'xCE\x84', # GREEK TONOS
'\xB5'=>'xCE\x85', # GREEK DIALYTIKA TONOS
'\xB6'=>'xCE\x86', # GREEK CAPITAL LETTER ALPHA WITH TONOS
'\xB7'=>'xC2\xB7', # MIDDLE DOT
'\xB8'=>'xCE\x88', # GREEK CAPITAL LETTER EPSILON WITH TONOS
'\xB9'=>'xCE\x89', # GREEK CAPITAL LETTER ETA WITH TONOS
'\xBA'=>'xCE\x8A', # GREEK CAPITAL LETTER IOTA WITH TONOS
'\xBB'=>'xC2\xBB', # RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
'\xBE'=>'xCE\x8E', # GREEK CAPITAL LETTER UPSILON WITH TONOS
'\xBF'=>'xCE\x8F', # GREEK CAPITAL LETTER OMEGA WITH TONOS
'\xC0'=>'xCE\x90', # GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS
'\xC1'=>'xCE\x91', # GREEK CAPITAL LETTER ALPHA
'\xC3'=>'xCE\x93', # GREEK CAPITAL LETTER GAMMA
'\xC4'=>'xCE\x94', # GREEK CAPITAL LETTER DELTA
'\xC5'=>'xCE\x95', # GREEK CAPITAL LETTER EPSILON
'\xC6'=>'xCE\x96', # GREEK CAPITAL LETTER ZETA
'\xC7'=>'xCE\x97', # GREEK CAPITAL LETTER ETA
'\xC8'=>'xCE\x98', # GREEK CAPITAL LETTER THETA
'\xC9'=>'xCE\x99', # GREEK CAPITAL LETTER IOTA
'\xCB'=>'xCE\x9B', # GREEK CAPITAL LETTER LAMDA
'\xCC'=>'xCE\x9C', # GREEK CAPITAL LETTER MU
'\xCD'=>'xCE\x9D', # GREEK CAPITAL LETTER NU
'\xCF'=>'xCE\x9F', # GREEK CAPITAL LETTER OMICRON
'\xD0'=>'xCE\xA0', # GREEK CAPITAL LETTER PI
'\xD1'=>'xCE\xA1', # GREEK CAPITAL LETTER RHO
'\xD3'=>'xCE\xA3', # GREEK CAPITAL LETTER SIGMA
'\xD4'=>'xCE\xA4', # GREEK CAPITAL LETTER TAU
'\xD5'=>'xCE\xA5', # GREEK CAPITAL LETTER UPSILON
'\xD6'=>'xCE\xA6', # GREEK CAPITAL LETTER PHI
'\xD7'=>'xCE\xA7', # GREEK CAPITAL LETTER CHI
'\xD8'=>'xCE\xA8', # GREEK CAPITAL LETTER PSI
'\xD9'=>'xCE\xA9', # GREEK CAPITAL LETTER OMEGA
'\xDA'=>'xCE\xAA', # GREEK CAPITAL LETTER IOTA WITH DIALYTIKA
'\xDB'=>'xCE\xAB', # GREEK CAPITAL LETTER UPSILON WITH DIALYTIKA
'\xDC'=>'xCE\xAC', # GREEK SMALL LETTER ALPHA WITH TONOS
'\xDD'=>'xCE\xAD', # GREEK SMALL LETTER EPSILON WITH TONOS
'\xDE'=>'xCE\xAE', # GREEK SMALL LETTER ETA WITH TONOS
'\xDF'=>'xCE\xAF', # GREEK SMALL LETTER IOTA WITH TONOS
'\xE0'=>'xCE\xB0', # GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS
'\xE1'=>'xCE\xB1', # GREEK SMALL LETTER ALPHA
'\xE3'=>'xCE\xB3', # GREEK SMALL LETTER GAMMA
'\xE4'=>'xCE\xB4', # GREEK SMALL LETTER DELTA
'\xE5'=>'xCE\xB5', # GREEK SMALL LETTER EPSILON
'\xE6'=>'xCE\xB6', # GREEK SMALL LETTER ZETA
'\xE7'=>'xCE\xB7', # GREEK SMALL LETTER ETA
'\xE8'=>'xCE\xB8', # GREEK SMALL LETTER THETA
'\xE9'=>'xCE\xB9', # GREEK SMALL LETTER IOTA
'\xEA'=>'xCE\xBA', # GREEK SMALL LETTER KAPPA
'\xEB'=>'xCE\xBB', # GREEK SMALL LETTER LAMDA
'\xEC'=>'xCE\xBC', # GREEK SMALL LETTER MU
'\xED'=>'xCE\xBD', # GREEK SMALL LETTER NU
'\xEE'=>'xCE\xBE', # GREEK SMALL LETTER XI
'\xEF'=>'xCE\xBF', # GREEK SMALL LETTER OMICRON
'\xF0'=>'xCF\xC0', # GREEK SMALL LETTER PI
'\xF1'=>'xCF\xC1', # GREEK SMALL LETTER RHO
'\xF2'=>'xCF\xC2', # GREEK SMALL LETTER FINAL SIGMA
'\xF3'=>'xCF\xC3', # GREEK SMALL LETTER SIGMA
'\xF4'=>'xCF\xC4', # GREEK SMALL LETTER TAU
'\xF5'=>'xCF\xC5', # GREEK SMALL LETTER UPSILON
'\xF6'=>'xCF\xC6', # GREEK SMALL LETTER PHI
'\xF7'=>'xCF\xC7', # GREEK SMALL LETTER CHI
'\xF8'=>'xCF\xC8', # GREEK SMALL LETTER PSI
'\xF9'=>'xCF\xC9', # GREEK SMALL LETTER OMEGA
'\xFA'=>'xCF\xCA', # GREEK SMALL LETTER IOTA WITH DIALYTIKA
'\xFB'=>'xCF\xCB', # GREEK SMALL LETTER UPSILON WITH DIALYTIKA
'\xFC'=>'xCF\xCC', # GREEK SMALL LETTER OMICRON WITH TONOS
'\xFD'=>'xCF\xCD', # GREEK SMALL LETTER UPSILON WITH TONOS
'\xFE'=>'xCF\xCE', # GREEK SMALL LETTER OMEGA WITH TONOS
);
s/(\\x\w\w)/$UTF{$1}/ge, print while <>;
More information about the Athens-pm
mailing list