SPUG: International characters from input form

Gary Hawkins ghawk at eskimo.com
Tue Jan 16 08:58:29 PST 2007

There's this webpage form where the user supplies some text, might be English,
Swedish, Chinese, Portugese, Russian, Hebrew, Arabic ...

To Perl, is it (or can it be made to be) clearcut which language the user is
sending to my program, truly without question, or are there any opportunities
for confusion or possible crossover in unicode-land.  

I do not want the user to have to tell me which language they are using, I want
that to be determined programmatically, and hope to hear that someone has
sorted all of that out already (Larry Wall and company or the folks at Apache)
with no grey areas.

On this:

SERVER_SOFTWARE = Apache/1.3.34 (Unix) mod_layout/3.2

... I tried printing back %ENV with normal English text and received this:

HTTP_ACCEPT_LANGUAGE = en-us,ja;q=0.5

... then tried inputing Japanese text instead and to my dismay saw the same

HTTP_ACCEPT_LANGUAGE = en-us,ja;q=0.5

I would have been real happy to see this instead:

HTTP_ACCEPT_LANGUAGE = ja,en-us;q=0.5

Maybe there is a way I can tell from the following that they (I) used Japanese?
I see it does appear to correctly reflect my 4 keyboard strokes for each field,
but how am I to know it isn't Swahili?

QUERY_STRING = Name1=%B6%C1%C4%C1&Name2=%BD%C1%BD%B2

Does "en-us,ja" indicate the two languages installed on the user's system?  If
so, that would make sense and provide a clue (I have both English and Japanese
keyboard inputs set up).  What is 'q'?  Now, if they happen to have Thai and
Vietnamese, how am I to know which one they are using?  Maybe Thai is %AF thru
%D7 and Vietnamese is %D8 thru %FF or some such thing?


Gary Hawkins

More information about the spug-list mailing list