SPUG: International characters from input form

Joshua ben Jore twists at gmail.com
Wed Jan 17 10:07:59 PST 2007

On 1/16/07, Gary Hawkins <ghawk at eskimo.com> wrote:
> There's this webpage form where the user supplies some text, might be English,
> Swedish, Chinese, Portugese, Russian, Hebrew, Arabic ...
> To Perl, is it (or can it be made to be) clearcut which language the user is
> sending to my program, truly without question, or are there any opportunities
> for confusion or possible crossover in unicode-land.
> I do not want the user to have to tell me which language they are using, I want
> that to be determined programmatically, and hope to hear that someone has
> sorted all of that out already (Larry Wall and company or the folks at Apache)
> with no grey areas.
> On this:
> SERVER_SOFTWARE = Apache/1.3.34 (Unix) mod_layout/3.2
> ... I tried printing back %ENV with normal English text and received this:
> HTTP_ACCEPT_LANGUAGE = en-us,ja;q=0.5
> ... then tried inputing Japanese text instead and to my dismay saw the same
> thing:
> HTTP_ACCEPT_LANGUAGE = en-us,ja;q=0.5
> I would have been real happy to see this instead:
> HTTP_ACCEPT_LANGUAGE = ja,en-us;q=0.5
> Maybe there is a way I can tell from the following that they (I) used Japanese?
> I see it does appear to correctly reflect my 4 keyboard strokes for each field,
> but how am I to know it isn't Swahili?
> QUERY_STRING = Name1=%B6%C1%C4%C1&Name2=%BD%C1%BD%B2

Whatever it was you typed wasn't passed to as unicode code points. At
least those could be interpreted unambiguously as unicode and treated
that way.


which is just my typical perly handle written down in the unicode for
braille and URI encoded. Since you really just sent some octets,
perhaps you also sent a HTTP header which indicated the character set
or encoding?


More information about the spug-list mailing list