[Pdx-pm] unicode vs western text encoding
Thomas Keller
kellert at ohsu.edu
Thu Jul 3 13:30:40 PDT 2008
Hi,
We get some emails that we need to parse. They come from a web form,
so I don't know why we are receiving some with unicode characters and
others as simple western text encoding. We receive the submitted
forms as a structured email message which I've written a parser to
process. I'm having trouble when they contain unicode characters.
Does anyone have a suggestion for first, detecting unicode in a text
file, and second stripping it of the weird stuff? I know I can just
use the translate function. Is that the "best" way? I'd have to know
ahead of time all the characters that I want to allow, that seems
really anti-best practices.
thanks,
Tom
MMI Shared Resource Facility
4-2442
kellert at ohsu.edu
BSc 6339b
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/pdx-pm-list/attachments/20080703/a316209a/attachment.html>
More information about the Pdx-pm-list
mailing list