From tigerpeng2001 at yahoo.com Thu Jan 26 12:58:53 2012 From: tigerpeng2001 at yahoo.com (tiger peng) Date: Thu, 26 Jan 2012 12:58:53 -0800 (PST) Subject: [Chicago-talk] Perl script run slower for utf8 Message-ID: <1327611533.58999.YahooMailNeo@web120502.mail.ne1.yahoo.com> Hello all, I just made a Perl script for parsing large CSV files (with Text::CSV). When I enable Unicode (with the three use uncommented), it took as twice longer time. Is it normal? Is there any way to speed it up? Thanks, #use utf8; #use encoding "utf-8"; #use open ':encoding(utf8)'; -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrew at cleverdomain.org Thu Jan 26 17:01:13 2012 From: andrew at cleverdomain.org (Andrew Rodland) Date: Thu, 26 Jan 2012 20:01:13 -0500 Subject: [Chicago-talk] Perl script run slower for utf8 In-Reply-To: <1327611533.58999.YahooMailNeo@web120502.mail.ne1.yahoo.com> References: <1327611533.58999.YahooMailNeo@web120502.mail.ne1.yahoo.com> Message-ID: Sometimes you can either do the wrong thing quickly, or do the right thing slowly. This is one of those times. Unicode support slows down a lot of matching operations because character class matching isn't just a matter of looking at bits in 256-entry bitmaps anymore. I would, however, check whether you have Text::CSV_XS installed, as it's faster than the pure-perl Text::CSV, and its speed is probably less-affected by Unicode. On Thu, Jan 26, 2012 at 3:58 PM, tiger peng wrote: > Hello all, > > I just made a Perl script for parsing large CSV files (with Text::CSV). > When I enable Unicode (with the three use uncommented), it took as twice > longer time. Is it normal? Is there any way to speed it up? > > Thanks, > > #use utf8; > #use encoding "utf-8"; > #use open ':encoding(utf8)'; > > _______________________________________________ > Chicago-talk mailing list > Chicago-talk at pm.org > http://mail.pm.org/mailman/listinfo/chicago-talk > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tigerpeng2001 at yahoo.com Fri Jan 27 07:11:40 2012 From: tigerpeng2001 at yahoo.com (tiger peng) Date: Fri, 27 Jan 2012 07:11:40 -0800 (PST) Subject: [Chicago-talk] Perl script run slower for utf8 In-Reply-To: References: <1327611533.58999.YahooMailNeo@web120502.mail.ne1.yahoo.com> Message-ID: <1327677100.65108.YahooMailNeo@web120502.mail.ne1.yahoo.com> Thanks for the light. The Text::CSV_XS is used. (Someone's baby Perl script for parsing the files take 15+ times more when utf8 is switched on). I will try: ??? 1) Eliminating as much regex as possible from the script; ??? 2) Scarifying readability, eliminating the call to the subs; If you think there are anything can help its performance, please let me know. Thanks, ________________________________ From: Andrew Rodland To: Chicago.pm chatter Sent: Thursday, January 26, 2012 7:01 PM Subject: Re: [Chicago-talk] Perl script run slower for utf8 Sometimes you can either do the wrong thing quickly, or do the right thing slowly. This is one of those times. Unicode support slows down a lot of matching operations because character class matching isn't just a matter of looking at bits in 256-entry bitmaps anymore. I would, however, check whether you have Text::CSV_XS installed, as it's faster than the pure-perl Text::CSV, and its speed is probably less-affected by Unicode. On Thu, Jan 26, 2012 at 3:58 PM, tiger peng wrote: Hello all, > > > >I just made a Perl script for parsing large CSV files (with Text::CSV). When I enable Unicode (with the three use uncommented), it took as twice longer time. Is it normal? Is there any way to speed it up? > > > >Thanks, > > >#use utf8; >#use encoding "utf-8"; >#use open ':encoding(utf8)'; >_______________________________________________ >Chicago-talk mailing list >Chicago-talk at pm.org >http://mail.pm.org/mailman/listinfo/chicago-talk > _______________________________________________ Chicago-talk mailing list Chicago-talk at pm.org http://mail.pm.org/mailman/listinfo/chicago-talk -------------- next part -------------- An HTML attachment was scrubbed... URL: