[Pdx-pm] unicode vs western text encoding

Thomas Keller kellert at ohsu.edu
Thu Jul 3 13:30:40 PDT 2008

We get some emails that we need to parse. They come from a web form,  
so I don't know why we are receiving some with unicode characters and  
others as simple western text encoding. We receive the submitted  
forms as a structured email message which I've written a parser to  
process. I'm having trouble when they contain unicode characters.  
Does anyone have a suggestion for first, detecting unicode in a text  
file, and second stripping it of the weird stuff? I know I can just  
use the translate function. Is that the "best" way? I'd have to know  
ahead of time all the characters that I want to allow, that seems  
really anti-best practices.


MMI Shared Resource Facility
kellert at ohsu.edu
BSc 6339b

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/pdx-pm-list/attachments/20080703/a316209a/attachment.html>

More information about the Pdx-pm-list mailing list