[Chicago-talk] Perl script run slower for utf8

tiger peng tigerpeng2001 at yahoo.com
Fri Jan 27 07:11:40 PST 2012

Thanks for the light. The Text::CSV_XS is used. (Someone's baby Perl script for parsing the files take 15+ times more when utf8 is switched on).

I will try:
    1) Eliminating as much regex as possible from the script;
    2) Scarifying readability, eliminating the call to the subs;

If you think there are anything can help its performance, please let me know.


 From: Andrew Rodland <andrew at cleverdomain.org>
To: Chicago.pm chatter <chicago-talk at pm.org> 
Sent: Thursday, January 26, 2012 7:01 PM
Subject: Re: [Chicago-talk] Perl script run slower for utf8

Sometimes you can either do the wrong thing quickly, or do the right thing slowly. This is one of those times. Unicode support slows down a lot of matching operations because character class matching isn't just a matter of looking at bits in 256-entry bitmaps anymore.

I would, however, check whether you have Text::CSV_XS installed, as it's faster than the pure-perl Text::CSV, and its speed is probably less-affected by Unicode.

On Thu, Jan 26, 2012 at 3:58 PM, tiger peng <tigerpeng2001 at yahoo.com> wrote:

Hello all,
>I just made a Perl script for parsing large CSV files (with Text::CSV). When I enable Unicode  (with the three use uncommented), it took as twice longer time. Is it normal? Is there any way to speed it up? 
>#use utf8;
>#use encoding "utf-8";
>#use open ':encoding(utf8)';
>Chicago-talk mailing list
>Chicago-talk at pm.org

Chicago-talk mailing list
Chicago-talk at pm.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/chicago-talk/attachments/20120127/1e522021/attachment.html>

More information about the Chicago-talk mailing list