[San-Diego-pm] Cold shower in UTF-8

Brian Manning elspicyjack at gmail.com
Sat Oct 26 10:43:30 PDT 2013


On Sat, Oct 26, 2013 at 6:02 AM, Joel Fentin <joel at fentin.com> wrote:
> Either you don't understand my problem or I don't understand you or both.
> But I appreciate your and Russ's efforts.

It must be me.

> Before the MySQL conversion, the operator would type the following into a
> text area:
>
> line1 + [enter key] + line2 + [enter key] + line3
>
> When they were done, they would click an OK button.
> I ran what they typed thru the following code before putting it into the
> database:
> $Value =~ s/\15//g; #snuff chr 13 (may screw up db file)
> $Value =~ s/\n/¶/g; #convert chr 10 to ¶
>
> In this case I arbitrarily chose ¶ to represent LF.

Which is not a legal UTF-8 character.

> To later access this for display on a webpage, I took what was in the
> database and ran it through this:
> $Value =~ s/¶/<br>/g;
>
> The displayed result looked like this:
> line1
> line2
> line3
>
> ======================
>
> If I attempt this now, I can do the same thing, but would have to replace
> the display code (above) with:
> $Value =~ s/¶/<br \/>/g;
>
> This because ¶ is greater than chr 127.
>
> Rather than roll my own, I'd rather go with a standard. I confess, when I go
> to http://en.wikipedia.org/wiki/UTF-8
> I don't quite grasp the Description nor the codepage layout. They give an
> example of €. I can't follow it. Worse, I don't know how much I need to know
> and how much I don't.

Can you use a different separator, such as the pipe character '|'
(decimal 124/0x7c), or use ASCII NUL (0x0), both of which are valid
UTF-8?  Any character below 0x7f or 127 decimal inclusive in the ASCII
table is also valid UTF-8.  It sounds like that's all you want to deal
with at the moment.

Thanks,

Brian


More information about the San-Diego-pm mailing list