From rkleeman at energoncube.net Wed Oct 16 13:43:20 2013 From: rkleeman at energoncube.net (Bob Kleemann) Date: Wed, 16 Oct 2013 13:43:20 -0700 Subject: [San-Diego-pm] Meeting Thursday! Message-ID: Perl Mongers, Our normal monthly meeting is this Thursday, October 16th. We'll be meeting again at the Ansir Innovation Center on Convoy St, starting around 7 PM. Bring your questions, ideas, and thoughts, and we'll discuss it all, along with any presentations that might emerge (I'm hoping to have at least a discussion on Perl and UTF-8, if it doesn't turn into a full blown presentation). Also, please remember this note from the fine folks at the Ansir Innovation Center: Please park on either Convoy Street, Engineer Road, or Brinell Street as we have a small parking lot that we share with other businesses. Avoid parking in other shopping centers as you may get towed. Look for the green door marked Suite 210. I'll look forward to seeing you all on Thursday evening! -- Bob From joel at fentin.com Fri Oct 25 16:22:57 2013 From: joel at fentin.com (Joel Fentin) Date: Fri, 25 Oct 2013 16:22:57 -0700 Subject: [San-Diego-pm] Cold shower in UTF-8 Message-ID: <526AFD51.2000204@fentin.com> I did some web sites long ago. Their owner moved them to Network Solutions. Network Solutions suddenly and without prior notice changed the MySQL character encoding to UTF-8. There are fields in the database which are displayed on webpages. I have some cleanup to do. Is there an industry standard for putting CR &/or LF into such a database text field? Or does everyone roll his own? Are there an industry standards for ?????????????? -- Joel Fentin tel: 760-749-8863 Biz Website: http://fentin.com Personal Website: http://fentin.com/me From elspicyjack at gmail.com Fri Oct 25 19:19:46 2013 From: elspicyjack at gmail.com (Brian Manning) Date: Fri, 25 Oct 2013 19:19:46 -0700 Subject: [San-Diego-pm] Cold shower in UTF-8 In-Reply-To: <526AFD51.2000204@fentin.com> References: <526AFD51.2000204@fentin.com> Message-ID: On Fri, Oct 25, 2013 at 4:22 PM, Joel Fentin wrote: > I did some web sites long ago. Their owner moved them to Network Solutions. > Network Solutions suddenly and without prior notice changed the MySQL > character encoding to UTF-8. There are fields in the database which are > displayed on webpages. I have some cleanup to do. > > Is there an industry standard for putting CR &/or LF into such a database > text field? Or does everyone roll his own? A SQL UPDATE using the output of a SELECT * from your existing tables should work I should think. You may also be able to drop then recreate the tables using the same encoding you used before. That would be up to NetSol. > Are there an industry standards for ?????????????? Yes, they're called ISO standards and/or Unicode standards, depending on what the encoding of your existing text is. You could use 'iconv' or 'enca/enconv' to detect and/or convert between your source encodings to UTF-8. You could also use *cough*PERL*cough*, but it's probably easier/quicker/faster to use existing tools built for this purpose than to roll your own in *cough*PERL*cough*. Thanks, Brian From joel at fentin.com Sat Oct 26 06:02:40 2013 From: joel at fentin.com (Joel Fentin) Date: Sat, 26 Oct 2013 06:02:40 -0700 Subject: [San-Diego-pm] Cold shower in UTF-8 In-Reply-To: References: <526AFD51.2000204@fentin.com> Message-ID: <526BBD70.50809@fentin.com> On 10/25/2013 7:19 PM, Brian Manning wrote: > On Fri, Oct 25, 2013 at 4:22 PM, Joel Fentin wrote: >> I did some web sites long ago. Their owner moved them to Network Solutions. >> Network Solutions suddenly and without prior notice changed the MySQL >> character encoding to UTF-8. There are fields in the database which are >> displayed on webpages. I have some cleanup to do. >> >> Is there an industry standard for putting CR &/or LF into such a database >> text field? Or does everyone roll his own? > > A SQL UPDATE using the output of a SELECT * from your existing tables > should work I should think. > > You may also be able to drop then recreate the tables using the same > encoding you used before. That would be up to NetSol. > >> Are there an industry standards for ?????????????? > > Yes, they're called ISO standards and/or Unicode standards, depending > on what the encoding of your existing text is. You could use 'iconv' > or 'enca/enconv' to detect and/or convert between your source > encodings to UTF-8. You could also use *cough*PERL*cough*, but it's > probably easier/quicker/faster to use existing tools built for this > purpose than to roll your own in *cough*PERL*cough*. > > Thanks, > > Brian Either you don't understand my problem or I don't understand you or both. But I appreciate your and Russ's efforts. Before the MySQL conversion, the operator would type the following into a text area: line1 + [enter key] + line2 + [enter key] + line3 When they were done, they would click an OK button. I ran what they typed thru the following code before putting it into the database: $Value =~ s/\15//g; #snuff chr 13 (may screw up db file) $Value =~ s/\n/?/g; #convert chr 10 to ? In this case I arbitrarily chose ? to represent LF. To later access this for display on a webpage, I took what was in the database and ran it through this: $Value =~ s/?/
/g; The displayed result looked like this: line1 line2 line3 ====================== If I attempt this now, I can do the same thing, but would have to replace the display code (above) with: $Value =~ s/??/
/g; This because ? is greater than chr 127. ====================== Rather than roll my own, I'd rather go with a standard. I confess, when I go to http://en.wikipedia.org/wiki/UTF-8 I don't quite grasp the Description nor the codepage layout. They give an example of ?. I can't follow it. Worse, I don't know how much I need to know and how much I don't. -- Joel Fentin tel: 760-749-8863 Biz Website: http://fentin.com Personal Website: http://fentin.com/me From elspicyjack at gmail.com Sat Oct 26 10:43:30 2013 From: elspicyjack at gmail.com (Brian Manning) Date: Sat, 26 Oct 2013 10:43:30 -0700 Subject: [San-Diego-pm] Cold shower in UTF-8 In-Reply-To: <526BBD70.50809@fentin.com> References: <526AFD51.2000204@fentin.com> <526BBD70.50809@fentin.com> Message-ID: On Sat, Oct 26, 2013 at 6:02 AM, Joel Fentin wrote: > Either you don't understand my problem or I don't understand you or both. > But I appreciate your and Russ's efforts. It must be me. > Before the MySQL conversion, the operator would type the following into a > text area: > > line1 + [enter key] + line2 + [enter key] + line3 > > When they were done, they would click an OK button. > I ran what they typed thru the following code before putting it into the > database: > $Value =~ s/\15//g; #snuff chr 13 (may screw up db file) > $Value =~ s/\n/?/g; #convert chr 10 to ? > > In this case I arbitrarily chose ? to represent LF. Which is not a legal UTF-8 character. > To later access this for display on a webpage, I took what was in the > database and ran it through this: > $Value =~ s/?/
/g; > > The displayed result looked like this: > line1 > line2 > line3 > > ====================== > > If I attempt this now, I can do the same thing, but would have to replace > the display code (above) with: > $Value =~ s/??/
/g; > > This because ? is greater than chr 127. > > Rather than roll my own, I'd rather go with a standard. I confess, when I go > to http://en.wikipedia.org/wiki/UTF-8 > I don't quite grasp the Description nor the codepage layout. They give an > example of ?. I can't follow it. Worse, I don't know how much I need to know > and how much I don't. Can you use a different separator, such as the pipe character '|' (decimal 124/0x7c), or use ASCII NUL (0x0), both of which are valid UTF-8? Any character below 0x7f or 127 decimal inclusive in the ASCII table is also valid UTF-8. It sounds like that's all you want to deal with at the moment. Thanks, Brian From tim.bollman at gmail.com Sat Oct 26 11:57:11 2013 From: tim.bollman at gmail.com (Tim Bollman) Date: Sat, 26 Oct 2013 11:57:11 -0700 Subject: [San-Diego-pm] Cold shower in UTF-8 In-Reply-To: References: <526AFD51.2000204@fentin.com> <526BBD70.50809@fentin.com> Message-ID: On Sat, Oct 26, 2013 at 10:43 AM, Brian Manning wrote: > On Sat, Oct 26, 2013 at 6:02 AM, Joel Fentin wrote: >> Either you don't understand my problem or I don't understand you or both. >> But I appreciate your and Russ's efforts. > > It must be me. > >> Before the MySQL conversion, the operator would type the following into a >> text area: >> >> line1 + [enter key] + line2 + [enter key] + line3 >> >> When they were done, they would click an OK button. >> I ran what they typed thru the following code before putting it into the >> database: >> $Value =~ s/\15//g; #snuff chr 13 (may screw up db file) >> $Value =~ s/\n/?/g; #convert chr 10 to ? >> >> In this case I arbitrarily chose ? to represent LF. > > Which is not a legal UTF-8 character. > >> To later access this for display on a webpage, I took what was in the >> database and ran it through this: >> $Value =~ s/?/
/g; >> >> The displayed result looked like this: >> line1 >> line2 >> line3 >> >> ====================== >> >> If I attempt this now, I can do the same thing, but would have to replace >> the display code (above) with: >> $Value =~ s/??/
/g; >> >> This because ? is greater than chr 127. >> >> Rather than roll my own, I'd rather go with a standard. I confess, when I go >> to http://en.wikipedia.org/wiki/UTF-8 >> I don't quite grasp the Description nor the codepage layout. They give an >> example of ?. I can't follow it. Worse, I don't know how much I need to know >> and how much I don't. > > Can you use a different separator, such as the pipe character '|' > (decimal 124/0x7c), or use ASCII NUL (0x0), both of which are valid > UTF-8? Any character below 0x7f or 127 decimal inclusive in the ASCII > table is also valid UTF-8. It sounds like that's all you want to deal > with at the moment. I'd recommend staying away from ascii NUL as much as you can. Use 0x1F (unit separator) or something instead. Equally unused in real text, but plays well with C. I suppose it hurts compatibility with Cobol (and I think some Fortran IO libraries actually use all the seperators too), but I don't see that as a bad thing. > > Thanks, > > Brian > _______________________________________________ > San-Diego-pm mailing list > San-Diego-pm at pm.org > http://mail.pm.org/mailman/listinfo/san-diego-pm From rlssdpm at schnapp.org Sat Oct 26 16:10:06 2013 From: rlssdpm at schnapp.org (Russ Schnapp) Date: Sat, 26 Oct 2013 16:10:06 -0700 Subject: [San-Diego-pm] Cold shower in UTF-8 In-Reply-To: References: <526AFD51.2000204@fentin.com> <526BBD70.50809@fentin.com> Message-ID: <526C4BCE.6050506@schnapp.org> On 10/26/2013 11:57 AM, Tim Bollman wrote: > On Sat, Oct 26, 2013 at 10:43 AM, Brian Manning wrote: >> On Sat, Oct 26, 2013 at 6:02 AM, Joel Fentin wrote: >>> Either you don't understand my problem or I don't understand you or both. >>> But I appreciate your and Russ's efforts. >> >> It must be me. >> >>> Before the MySQL conversion, the operator would type the following into a >>> text area: >>> >>> line1 + [enter key] + line2 + [enter key] + line3 >>> >>> When they were done, they would click an OK button. >>> I ran what they typed thru the following code before putting it into the >>> database: >>> $Value =~ s/\15//g; #snuff chr 13 (may screw up db file) I don't understand why you're doing this. How could a CR character possibly "screw up" the db file? You're storing a string into a text column. You ought to be able to incorporate anything you like in the string. If, for some reason, you do encounter problems using a text column, try using a blob. >>> $Value =~ s/\n/?/g; #convert chr 10 to ? >>> >>> In this case I arbitrarily chose ? to represent LF. >> >> Which is not a legal UTF-8 character. >> >>> To later access this for display on a webpage, I took what was in the >>> database and ran it through this: >>> $Value =~ s/?/
/g; >>> >>> The displayed result looked like this: >>> line1 >>> line2 >>> line3 >>> >>> ====================== >>> >>> If I attempt this now, I can do the same thing, but would have to replace >>> the display code (above) with: >>> $Value =~ s/??/
/g; >>> >>> This because ? is greater than chr 127. >>> >>> Rather than roll my own, I'd rather go with a standard. I confess, when I go >>> to http://en.wikipedia.org/wiki/UTF-8 >>> I don't quite grasp the Description nor the codepage layout. They give an >>> example of ?. I can't follow it. Worse, I don't know how much I need to know >>> and how much I don't. >> >> Can you use a different separator, such as the pipe character '|' >> (decimal 124/0x7c), or use ASCII NUL (0x0), both of which are valid >> UTF-8? Any character below 0x7f or 127 decimal inclusive in the ASCII >> table is also valid UTF-8. It sounds like that's all you want to deal >> with at the moment. > > I'd recommend staying away from ascii NUL as much as you can. Use 0x1F > (unit separator) or something instead. Equally unused in real text, > but plays well with C. I suppose it hurts compatibility with Cobol > (and I think some Fortran IO libraries actually use all the seperators > too), but I don't see that as a bad thing. > >> >> Thanks, >> >> Brian >> _______________________________________________ >> San-Diego-pm mailing list >> San-Diego-pm at pm.org >> http://mail.pm.org/mailman/listinfo/san-diego-pm > _______________________________________________ > San-Diego-pm mailing list > San-Diego-pm at pm.org > http://mail.pm.org/mailman/listinfo/san-diego-pm > From thierryv at abac.com Mon Oct 28 13:56:00 2013 From: thierryv at abac.com (Thierry de Villeneuve) Date: Mon, 28 Oct 2013 21:56:00 +0100 Subject: [San-Diego-pm] Cold shower in UTF-8 In-Reply-To: <526AFD51.2000204@fentin.com> References: <526AFD51.2000204@fentin.com> Message-ID: <34E5AC8F-5CE0-41D2-A3AF-64B4C17D39C3@abac.com> Hello Jo?l, There are several things to remember. - First, the MySQL Instance may be setup up with a specific default charset. It's defined with the "default-character-set" setup parameter. You can discover the default encoding used for the client connections and unspecified table creations [mysqld] character-set-server=utf8 default-collation=utf8_unicode_ci [client] default-character-set=utf8 That you can query with : mysql> SHOW VARIABLES LIKE 'character%'; +--------------------------+--------------------------------------------------------+ | Variable_name | Value | +--------------------------+--------------------------------------------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | utf8 | | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | utf8 | | character_set_system | utf8 | +--------------------------+--------------------------------------------------------+ And figure out what charsets are implemented with this MySQL build : mysql> SHOW CHARACTER SET; +----------+-----------------------------+---------------------+--------+ | Charset | Description | Default collation | Maxlen | +----------+-----------------------------+---------------------+--------+ | big5 | Big5 Traditional Chinese | big5_chinese_ci | 2 | | dec8 | DEC West European | dec8_swedish_ci | 1 | | cp850 | DOS West European | cp850_general_ci | 1 | | hp8 | HP West European | hp8_english_ci | 1 | | koi8r | KOI8-R Relcom Russian | koi8r_general_ci | 1 | | latin1 | cp1252 West European | latin1_swedish_ci | 1 | | latin2 | ISO 8859-2 Central European | latin2_general_ci | 1 | | swe7 | 7bit Swedish | swe7_swedish_ci | 1 | | ascii | US ASCII | ascii_general_ci | 1 | | ujis | EUC-JP Japanese | ujis_japanese_ci | 3 | | sjis | Shift-JIS Japanese | sjis_japanese_ci | 2 | | hebrew | ISO 8859-8 Hebrew | hebrew_general_ci | 1 | | tis620 | TIS620 Thai | tis620_thai_ci | 1 | | euckr | EUC-KR Korean | euckr_korean_ci | 2 | | koi8u | KOI8-U Ukrainian | koi8u_general_ci | 1 | | gb2312 | GB2312 Simplified Chinese | gb2312_chinese_ci | 2 | | greek | ISO 8859-7 Greek | greek_general_ci | 1 | | cp1250 | Windows Central European | cp1250_general_ci | 1 | | gbk | GBK Simplified Chinese | gbk_chinese_ci | 2 | | latin5 | ISO 8859-9 Turkish | latin5_turkish_ci | 1 | | armscii8 | ARMSCII-8 Armenian | armscii8_general_ci | 1 | | utf8 | UTF-8 Unicode | utf8_general_ci | 3 | | ucs2 | UCS-2 Unicode | ucs2_general_ci | 2 | | cp866 | DOS Russian | cp866_general_ci | 1 | | keybcs2 | DOS Kamenicky Czech-Slovak | keybcs2_general_ci | 1 | | macce | Mac Central European | macce_general_ci | 1 | | macroman | Mac West European | macroman_general_ci | 1 | | cp852 | DOS Central European | cp852_general_ci | 1 | | latin7 | ISO 8859-13 Baltic | latin7_general_ci | 1 | | utf8mb4 | UTF-8 Unicode | utf8mb4_general_ci | 4 | | cp1251 | Windows Cyrillic | cp1251_general_ci | 1 | | utf16 | UTF-16 Unicode | utf16_general_ci | 4 | | cp1256 | Windows Arabic | cp1256_general_ci | 1 | | cp1257 | Windows Baltic | cp1257_general_ci | 1 | | utf32 | UTF-32 Unicode | utf32_general_ci | 4 | | binary | Binary pseudo charset | binary | 1 | | geostd8 | GEOSTD8 Georgian | geostd8_general_ci | 1 | | cp932 | SJIS for Windows Japanese | cp932_japanese_ci | 2 | | eucjpms | UJIS for Windows Japanese | eucjpms_japanese_ci | 3 | +----------+-----------------------------+---------------------+--------+ - Then, When the tables are created, the DBA must specify under which charset the table is created and the collation pattern to override any instance level setup. SET NAMES utf8; CREATE TABLE `someTable` ( `qId` INT unsigned NOT NULL AUTO_INCREMENT, `qFileName` VARCHAR(128) NOT NULL DEFAULT '', PRIMARY KEY (`qId`) ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci; - Finally, the client connection must specify the character encoding to be used for the data transfers. Java ... jdbc:mysql://somehost.domain.com:3306/mydb?useUnicode=true&characterEncoding=UTF-8 Perl ... eval { my $hsig = set_sig_handler ( 'ALRM', sub { my $canceled = 1; die; }, { mask=>[ qw( INT ALRM ) ] ,safe => 0 } ); eval { alarm ($timeout); $l_dbh = DBI->connect($dsn, "$args->{DBuser}", "$args->{DBpswd}", { PrintError => 0, ### Don't report errors via warn( ) RaiseError => 1, ### Do report errors via die( ) AutoCommit => 0, ### Do not commit automatically Inserts and Updates ShowErrorStatement => 1, ### Do show the statement in error mysql_auto_reconnect => 0, ### Complex issue when using table locking mysql_multi_statements => 1, ### To help on calling Stored Procedures. Do not enable server-side prepared statements mysql_server_prepare => 0, ### Do NEVER USE 1 ! It's broken: Signed integers are returned as Unsigned integer !!! mysql_enable_utf8 => 1 ### Override instance defaults } ); }; alarm (0); die "$@" if $@; }; Now. On Oct 26, 2013, at 1:22 AM, Joel Fentin wrote: > I did some web sites long ago. Their owner moved them to Network Solutions. Network Solutions suddenly and without prior notice changed the MySQL character encoding to UTF-8. There are fields in the database which are displayed on webpages. I have some cleanup to do. > > Is there an industry standard for putting CR &/or LF into such a database text field? Or does everyone roll his own? > > Are there an industry standards for ?????????????? > > -- > Joel Fentin tel: 760-749-8863 > Biz Website: http://fentin.com > Personal Website: http://fentin.com/me > _______________________________________________ > San-Diego-pm mailing list > San-Diego-pm at pm.org > http://mail.pm.org/mailman/listinfo/san-diego-pm > From thierryv at abac.com Mon Oct 28 14:38:18 2013 From: thierryv at abac.com (Thierry de Villeneuve) Date: Mon, 28 Oct 2013 22:38:18 +0100 Subject: [San-Diego-pm] Cold shower in UTF-8 In-Reply-To: <526AFD51.2000204@fentin.com> References: <526AFD51.2000204@fentin.com> Message-ID: <3CD0BED5-BB6E-44B6-8D69-4200EB907294@abac.com> (clicked "send" by mistake. No complete) Hello Jo?l, There are several things to remember. - First, the MySQL Instance may be setup up with a specific default charset. It's defined with the "default-character-set" setup parameter. You can discover the default encoding used for the client connections and unspecified table creations. This default setup addresses no-charset specified client connections and no-charset specified table creation scripts. [mysqld] character-set-server=utf8 default-collation=utf8_unicode_ci [client] default-character-set=utf8 That you can query with : mysql> SHOW VARIABLES LIKE 'character%'; +--------------------------+--------------------------------------------------------+ | Variable_name | Value | +--------------------------+--------------------------------------------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | utf8 | | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | utf8 | | character_set_system | utf8 | +--------------------------+--------------------------------------------------------+ And figure out what charsets are implemented with this MySQL build : mysql> SHOW CHARACTER SET; +----------+-----------------------------+---------------------+--------+ | Charset | Description | Default collation | Maxlen | +----------+-----------------------------+---------------------+--------+ | big5 | Big5 Traditional Chinese | big5_chinese_ci | 2 | | dec8 | DEC West European | dec8_swedish_ci | 1 | | cp850 | DOS West European | cp850_general_ci | 1 | | hp8 | HP West European | hp8_english_ci | 1 | | koi8r | KOI8-R Relcom Russian | koi8r_general_ci | 1 | | latin1 | cp1252 West European | latin1_swedish_ci | 1 | | latin2 | ISO 8859-2 Central European | latin2_general_ci | 1 | | swe7 | 7bit Swedish | swe7_swedish_ci | 1 | | ascii | US ASCII | ascii_general_ci | 1 | | ujis | EUC-JP Japanese | ujis_japanese_ci | 3 | | sjis | Shift-JIS Japanese | sjis_japanese_ci | 2 | | hebrew | ISO 8859-8 Hebrew | hebrew_general_ci | 1 | | tis620 | TIS620 Thai | tis620_thai_ci | 1 | | euckr | EUC-KR Korean | euckr_korean_ci | 2 | | koi8u | KOI8-U Ukrainian | koi8u_general_ci | 1 | | gb2312 | GB2312 Simplified Chinese | gb2312_chinese_ci | 2 | | greek | ISO 8859-7 Greek | greek_general_ci | 1 | | cp1250 | Windows Central European | cp1250_general_ci | 1 | | gbk | GBK Simplified Chinese | gbk_chinese_ci | 2 | | latin5 | ISO 8859-9 Turkish | latin5_turkish_ci | 1 | | armscii8 | ARMSCII-8 Armenian | armscii8_general_ci | 1 | | utf8 | UTF-8 Unicode | utf8_general_ci | 3 | | ucs2 | UCS-2 Unicode | ucs2_general_ci | 2 | | cp866 | DOS Russian | cp866_general_ci | 1 | | keybcs2 | DOS Kamenicky Czech-Slovak | keybcs2_general_ci | 1 | | macce | Mac Central European | macce_general_ci | 1 | | macroman | Mac West European | macroman_general_ci | 1 | | cp852 | DOS Central European | cp852_general_ci | 1 | | latin7 | ISO 8859-13 Baltic | latin7_general_ci | 1 | | utf8mb4 | UTF-8 Unicode | utf8mb4_general_ci | 4 | | cp1251 | Windows Cyrillic | cp1251_general_ci | 1 | | utf16 | UTF-16 Unicode | utf16_general_ci | 4 | | cp1256 | Windows Arabic | cp1256_general_ci | 1 | | cp1257 | Windows Baltic | cp1257_general_ci | 1 | | utf32 | UTF-32 Unicode | utf32_general_ci | 4 | | binary | Binary pseudo charset | binary | 1 | | geostd8 | GEOSTD8 Georgian | geostd8_general_ci | 1 | | cp932 | SJIS for Windows Japanese | cp932_japanese_ci | 2 | | eucjpms | UJIS for Windows Japanese | eucjpms_japanese_ci | 3 | +----------+-----------------------------+---------------------+--------+ - Then, When the tables are created, the DBA must specify under which charset the table is created and the collation pattern to override any instance level setup. SET NAMES utf8; CREATE TABLE `someTable` ( `qId` INT unsigned NOT NULL AUTO_INCREMENT, `qFileName` VARCHAR(128) NOT NULL DEFAULT '', PRIMARY KEY (`qId`) ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci; - Finally, the client connection must specify the character encoding to be used for the data transfers. Java ... jdbc:mysql://somehost.domain.com:3306/mydb?useUnicode=true&characterEncoding=UTF-8 Perl ... eval { my $hsig = set_sig_handler ( 'ALRM', sub { my $canceled = 1; die; }, { mask=>[ qw( INT ALRM ) ] ,safe => 0 } ); eval { alarm ($timeout); $l_dbh = DBI->connect($dsn, "$args->{DBuser}", "$args->{DBpswd}", { PrintError => 0, ### Don't report errors via warn( ) RaiseError => 1, ### Do report errors via die( ) AutoCommit => 0, ### Do not commit automatically Inserts and Updates ShowErrorStatement => 1, ### Do show the statement in error mysql_auto_reconnect => 0, ### Complex issue when using table locking mysql_multi_statements => 1, ### To help on calling Stored Procedures. Do not enable server-side prepared statements mysql_server_prepare => 0, ### Do NEVER USE 1 ! It's broken: Signed integers are returned as Unsigned integer !!! mysql_enable_utf8 => 1 ### Override instance defaults } ); }; alarm (0); die "$@" if $@; }; Now, if you happen to be no longer capable of reading former table data, it's more likely that one of the default setting of the instance has been changed and your client connection has no charset defined for translations and your tables were created without specifying a charset. There is not really such thing as "NS has changed the MySQL character encoding". Of what ? I would recommend you identify what is the instance default setup, as in "first". Then that to identify how the tables are created, with a SHOW CREATE TABLE `someTable`; Then set client connection charset accordingly. No clue is given on how the "move to NS" happened. Most probably using a poorly crafted mysqldump script. You may have to extract all data using a mysqldump --no-data script to extract and fix the schema, then a mysqldump --skip-set-charset --no-create-db --no-create-info script to extract the raw data, using text editors to fix the charset. Finally, insert the data back into a new DB with a repaired schema. CREATE TABLE `someTableA` ( `qId` INT unsigned NOT NULL AUTO_INCREMENT, `qSomeName` VARCHAR(128) NOT NULL DEFAULT '', PRIMARY KEY (`qId`) ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=latin1 COLLATE=latin1_swedish_ci; CREATE TABLE `someTableB` ( `qId` INT unsigned NOT NULL AUTO_INCREMENT, `qSomeName` VARCHAR(128) NOT NULL DEFAULT '', PRIMARY KEY (`qId`) ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci; SHOW CREATE TABLE `someTableA`; SHOW CREATE TABLE `someTableB`; I hope this has helped you. Thierry On Oct 26, 2013, at 1:22 AM, Joel Fentin wrote: > I did some web sites long ago. Their owner moved them to Network Solutions. Network Solutions suddenly and without prior notice changed the MySQL character encoding to UTF-8. There are fields in the database which are displayed on webpages. I have some cleanup to do. > > Is there an industry standard for putting CR &/or LF into such a database text field? Or does everyone roll his own? > > Are there an industry standards for ?????????????? > > -- > Joel Fentin tel: 760-749-8863 > Biz Website: http://fentin.com > Personal Website: http://fentin.com/me > _______________________________________________ > San-Diego-pm mailing list > San-Diego-pm at pm.org > http://mail.pm.org/mailman/listinfo/san-diego-pm > From joel at fentin.com Thu Oct 31 11:38:50 2013 From: joel at fentin.com (Joel Fentin) Date: Thu, 31 Oct 2013 11:38:50 -0700 Subject: [San-Diego-pm] [SPAM]- Re: Cold shower in UTF-8 In-Reply-To: <3CD0BED5-BB6E-44B6-8D69-4200EB907294@abac.com> References: <526AFD51.2000204@fentin.com> <3CD0BED5-BB6E-44B6-8D69-4200EB907294@abac.com> Message-ID: <5272A3BA.2070300@fentin.com> Thank you all for your answers: Brian said: You may also be able to drop then recreate the tables using the same encoding you used before. That would be up to NetSol. Thierry wrote and perhaps said something similar. At least from what I understood of what he said. Again he talked of the table structure. I doubt NetSol would let me touch anything. Their motto seems to be: Let's make things harder to do. Brian again said: Can you use a different separator, such as the pipe character '|' That comes back to rolling my own. Then there still exist the issues with ?????????????? Tim said: I'd recommend staying away from ascii NUL as much as you can. Use 0x1F (unit separator) or something instead. Equally unused in real text, but plays well with C....... Again that smacks of rolling my own. Russ said: I don't understand why you're doing this. How could a CR character possibly "screw up" the db file? It's been quite a while since I set this up originally, but I recall that there was a severe problem putting chr 10 &/or chr 13 into the database. That's why I convered it to ?. Someone who has been lurking in this group suggested: http://search.cpan.org/~rjbs/perl-5.18.1/lib/utf8.pm Since I see the problem as one of converting chrs < 32 and > 127 into something the database will accept and then converting them back again, this comes closest to what I want. ============= I have some experimenting to do. Again, thank you all. -- Joel Fentin tel: 760-749-8863 Biz Website: http://fentin.com Personal Website: http://fentin.com/me