[San-Diego-pm] Cold shower in UTF-8

Joel Fentin joel at fentin.com
Sat Oct 26 06:02:40 PDT 2013


On 10/25/2013 7:19 PM, Brian Manning wrote:
> On Fri, Oct 25, 2013 at 4:22 PM, Joel Fentin <joel at fentin.com> wrote:
>> I did some web sites long ago. Their owner moved them to Network Solutions.
>> Network Solutions suddenly and without prior notice changed the MySQL
>> character encoding to UTF-8. There are fields in the database which are
>> displayed on webpages. I have some cleanup to do.
>>
>> Is there an industry standard for putting CR &/or LF into such a database
>> text field? Or does everyone roll his own?
>
> A SQL UPDATE using the output of a SELECT * from your existing tables
> should work I should think.
>
> You may also be able to drop then recreate the tables using the same
> encoding you used before.  That would be up to NetSol.
>
>> Are there an industry standards for áéíñóúÁÉÍÑÓÚ¡¿
>
> Yes, they're called ISO standards and/or Unicode standards, depending
> on what the encoding of your existing text is.  You could use 'iconv'
> or 'enca/enconv' to detect and/or convert between your source
> encodings to UTF-8.  You could also use *cough*PERL*cough*, but it's
> probably easier/quicker/faster to use existing tools built for this
> purpose than to roll your own in *cough*PERL*cough*.
>
> Thanks,
>
> Brian

Either you don't understand my problem or I don't understand you 
or both. But I appreciate your and Russ's efforts.

Before the MySQL conversion, the operator would type the following 
into a text area:

line1 + [enter key] + line2 + [enter key] + line3

When they were done, they would click an OK button.
I ran what they typed thru the following code before putting it 
into the database:
$Value =~ s/\15//g; #snuff chr 13 (may screw up db file)
$Value =~ s/\n/¶/g; #convert chr 10 to ¶

In this case I arbitrarily chose ¶ to represent LF.

To later access this for display on a webpage, I took what was in 
the database and ran it through this:
$Value =~ s/¶/<br>/g;

The displayed result looked like this:
line1
line2
line3

======================

If I attempt this now, I can do the same thing, but would have to 
replace the display code (above) with:
$Value =~ s/¶/<br \/>/g;

This because ¶ is greater than chr 127.

======================

Rather than roll my own, I'd rather go with a standard. I confess, 
when I go to http://en.wikipedia.org/wiki/UTF-8
I don't quite grasp the Description nor the codepage layout. They 
give an example of €. I can't follow it. Worse, I don't know how 
much I need to know and how much I don't.

-- 
Joel Fentin       tel: 760-749-8863
Biz Website:      http://fentin.com
Personal Website: http://fentin.com/me


More information about the San-Diego-pm mailing list