SPUG: International characters from input form

Wed Jan 17 10:07:59 PST 2007

On 1/16/07, Gary Hawkins <ghawk at eskimo.com> wrote:
> There's this webpage form where the user supplies some text, might be English,
> Swedish, Chinese, Portugese, Russian, Hebrew, Arabic ...
>
> To Perl, is it (or can it be made to be) clearcut which language the user is
> sending to my program, truly without question, or are there any opportunities
> for confusion or possible crossover in unicode-land.
>
> I do not want the user to have to tell me which language they are using, I want
> that to be determined programmatically, and hope to hear that someone has
> sorted all of that out already (Larry Wall and company or the folks at Apache)
> with no grey areas.
>
> On this:
>
> SERVER_SOFTWARE = Apache/1.3.34 (Unix) mod_layout/3.2
>
> ... I tried printing back %ENV with normal English text and received this:
>
> HTTP_ACCEPT_LANGUAGE = en-us,ja;q=0.5
>
> ... then tried inputing Japanese text instead and to my dismay saw the same
> thing:
>
> HTTP_ACCEPT_LANGUAGE = en-us,ja;q=0.5
>
> I would have been real happy to see this instead:
>
> HTTP_ACCEPT_LANGUAGE = ja,en-us;q=0.5
>
> Maybe there is a way I can tell from the following that they (I) used Japanese?
> I see it does appear to correctly reflect my 4 keyboard strokes for each field,
> but how am I to know it isn't Swahili?
>
> QUERY_STRING = Name1=%B6%C1%C4%C1&Name2=%BD%C1%BD%B2

Whatever it was you typed wasn't passed to as unicode code points. At
least those could be interpreted unambiguously as unicode and treated
that way.

%u2819%u280a%u2815%u281e%u2801%u2807%u2811%u2827%u280a;

which is just my typical perly handle written down in the unicode for
braille and URI encoded. Since you really just sent some octets,
perhaps you also sent a HTTP header which indicated the character set
or encoding?

Josh