[San-Diego-pm] Cold shower in UTF-8
elspicyjack at gmail.com
Sat Oct 26 10:43:30 PDT 2013
On Sat, Oct 26, 2013 at 6:02 AM, Joel Fentin <joel at fentin.com> wrote:
> Either you don't understand my problem or I don't understand you or both.
> But I appreciate your and Russ's efforts.
It must be me.
> Before the MySQL conversion, the operator would type the following into a
> text area:
> line1 + [enter key] + line2 + [enter key] + line3
> When they were done, they would click an OK button.
> I ran what they typed thru the following code before putting it into the
> $Value =~ s/\15//g; #snuff chr 13 (may screw up db file)
> $Value =~ s/\n/¶/g; #convert chr 10 to ¶
> In this case I arbitrarily chose ¶ to represent LF.
Which is not a legal UTF-8 character.
> To later access this for display on a webpage, I took what was in the
> database and ran it through this:
> $Value =~ s/¶/<br>/g;
> The displayed result looked like this:
> If I attempt this now, I can do the same thing, but would have to replace
> the display code (above) with:
> $Value =~ s/Â¶/<br \/>/g;
> This because ¶ is greater than chr 127.
> Rather than roll my own, I'd rather go with a standard. I confess, when I go
> to http://en.wikipedia.org/wiki/UTF-8
> I don't quite grasp the Description nor the codepage layout. They give an
> example of €. I can't follow it. Worse, I don't know how much I need to know
> and how much I don't.
Can you use a different separator, such as the pipe character '|'
(decimal 124/0x7c), or use ASCII NUL (0x0), both of which are valid
UTF-8? Any character below 0x7f or 127 decimal inclusive in the ASCII
table is also valid UTF-8. It sounds like that's all you want to deal
with at the moment.
More information about the San-Diego-pm