From lamp.purl at gmail.com Thu Aug 7 03:22:48 2008 From: lamp.purl at gmail.com (purl lamp) Date: Thu, 7 Aug 2008 18:22:48 +0800 Subject: [PerlChina] =?gb2312?b?QWN0aXZlU3RhdGUgzOG5qbj8tuC1xCBDUEFOIMSj?= =?gb2312?b?v+k=?= In-Reply-To: <367185060807282003n76429b63qa8c4ebc3d5e06ffc@mail.gmail.com> References: <488E7910.6010304@gmail.com> <367185060807282003n76429b63qa8c4ebc3d5e06ffc@mail.gmail.com> Message-ID: <367185060808070322r6d3714c4pa50751d6feb324d9@mail.gmail.com> ppm repo add http://ppm.activestate.com/beta/MSWin32-x86/5.10/1000/package.xml and you will get the useful script pod2pdf via: ppm install pod2pdf that's really nice thing for windows user to get document printed/shared 2008/7/29 purl lamp > thanks God for the good thing happened finally :D > thanks qiang for news delivery, using it soon. > wish a good thing and keep an eye on it long enough, and that's a blessing! > > 2008/7/29 shijialee at gmail.com : > > 目前是 ppm 测试版,看起来 activeperl 会支持更多的 perl 模块。目前有 7375 包.. > > > > http://ppm.activestate.com/beta/ > > > > Qiang > > _______________________________________________ > > China-pm mailing list > > China-pm at pm.org > > http://mail.pm.org/mailman/listinfo/china-pm > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wanliyou at gmail.com Tue Aug 12 19:35:00 2008 From: wanliyou at gmail.com (Question) Date: Wed, 13 Aug 2008 10:35:00 +0800 Subject: [PerlChina] =?gb2312?b?WE1M1tDOxA==?= Message-ID: <41143d8f0808121935k3215669dndf5c89946f136088@mail.gmail.com> Hi, 现在想用perl解析一个asx(xml)文件,现在用XML::DOM发现不支持中文。 文件大概是这样的: 视频 google了一圈没什么结果,自己用regex匹配太累。 各位有什么办法么? -- >: ~ -------------- next part -------------- An HTML attachment was scrubbed... URL: From beckheng at sina.com Tue Aug 12 19:43:53 2008 From: beckheng at sina.com (Beck Lam) Date: Wed, 13 Aug 2008 10:43:53 +0800 Subject: [PerlChina] =?utf-8?b?562U5aSNOiAgWE1M5Lit5paH?= In-Reply-To: <41143d8f0808121935k3215669dndf5c89946f136088@mail.gmail.com> References: <41143d8f0808121935k3215669dndf5c89946f136088@mail.gmail.com> Message-ID: <000601c8fcee$6ffbd450$5e04a8c0@bsdperler> 用XML::Parser ________________________________________ 发件人: china-pm-bounces+beckheng=sina.com at pm.org [mailto:china-pm-bounces+beckheng=sina.com at pm.org] 代表 Question 发送时间: 2008年8月13日 10:35 收件人: china-pm 主题: [PerlChina] XML中文 Hi, 现在想用perl解析一个asx(xml)文件,现在用XML::DOM发现不支持中文。 文件大概是这样的: 视频 google了一圈没什么结果,自己用regex匹配太累。 各位有什么办法么? -- >: ~ From fayland at gmail.com Tue Aug 12 19:47:44 2008 From: fayland at gmail.com (Fayland Lam) Date: Wed, 13 Aug 2008 10:47:44 +0800 Subject: [PerlChina] =?gb2312?b?WE1M1tDOxA==?= In-Reply-To: <41143d8f0808121935k3215669dndf5c89946f136088@mail.gmail.com> References: <41143d8f0808121935k3215669dndf5c89946f136088@mail.gmail.com> Message-ID: <48A24B50.9000301@gmail.com> Question wrote: > Hi, > 现在想用perl解析一个asx(xml)文件,现在用XML::DOM发现不支持中文。 http://wiki.perlchina.org/index.php/Perl-XML-FAQ http://wiki.perlchina.org/index.php/%E5%BF%AB%E9%80%9F%E5%BC%80%E5%A7%8BPerl_XML%EF%BC%9A%E6%8E%A5%E5%8F%A3%E7%AF%87 一般来说,最好你去解析 utf8 的 xml 文件 如果需要解析 gb2312 的文件的话,下载 http://www.fayland.org/misc/backup/gb2312.enc 找到 XML::Parser 的地址 bash-3.00$ perldoc -l XML::Parser /usr/lib/perl5/vendor_perl/5.8.6/i386-linux-thread-multi/XML/Parser.pm 然后下载过去放到 /usr/lib/perl5/vendor_perl/5.8.6/i386-linux-thread-multi/XML/Parser/Encodings/ 下试试。 Thanks. > 文件大概是这样的: > > > 视频 > > > > google了一圈没什么结果,自己用regex匹配太累。 > 各位有什么办法么? > > > -- > >: ~ > ------------------------------------------------------------------------ > > _______________________________________________ > China-pm mailing list > China-pm at pm.org > http://mail.pm.org/mailman/listinfo/china-pm -- Fayland Lam // http://www.fayland.org/ Foorum based on Catalyst // http://www.foorumbbs.com/ From wanliyou at gmail.com Tue Aug 12 20:13:06 2008 From: wanliyou at gmail.com (Question) Date: Wed, 13 Aug 2008 11:13:06 +0800 Subject: [PerlChina] =?gb2312?b?WE1M1tDOxA==?= In-Reply-To: <48A24B50.9000301@gmail.com> References: <41143d8f0808121935k3215669dndf5c89946f136088@mail.gmail.com> <48A24B50.9000301@gmail.com> Message-ID: <41143d8f0808122013l46a79916ub123bf6617dbc7c2@mail.gmail.com> 2008/8/13 Fayland Lam > Question wrote: > > Hi, > > 现在想用perl解析一个asx(xml)文件,现在用XML::DOM发现不支持中文。 > > http://wiki.perlchina.org/index.php/Perl-XML-FAQ > > http://wiki.perlchina.org/index.php/%E5%BF%AB%E9%80%9F%E5%BC%80%E5%A7%8BPerl_XML%EF%BC%9A%E6%8E%A5%E5%8F%A3%E7%AF%87 > > 一般来说,最好你去解析 utf8 的 xml 文件 > 如果需要解析 gb2312 的文件的话,下载 > http://www.fayland.org/misc/backup/gb2312.enc > 找到 XML::Parser 的地址 > > bash-3.00$ perldoc -l XML::Parser > /usr/lib/perl5/vendor_perl/5.8.6/i386-linux-thread-multi/XML/Parser.pm > > 然后下载过去放到 > > /usr/lib/perl5/vendor_perl/5.8.6/i386-linux-thread-multi/XML/Parser/Encodings/ > 下试试。 > > Thanks. > 谢谢各位,文件是这样的,asx文件 CCTV视频 XMLFDFFD FREE CCTV视频 XMLFDFFD FREE 我自己在前面加上 CCTV视频 加上cdata也是不行。 use XML::Parser; my $parser = new XML::Parser(); $parser->parsefile("test\\test.asx"); -- >: ~ -------------- next part -------------- An HTML attachment was scrubbed... URL: From wanliyou at gmail.com Tue Aug 12 20:35:20 2008 From: wanliyou at gmail.com (Question) Date: Wed, 13 Aug 2008 11:35:20 +0800 Subject: [PerlChina] =?gb2312?b?WE1M1tDOxA==?= In-Reply-To: <41143d8f0808122013l46a79916ub123bf6617dbc7c2@mail.gmail.com> References: <41143d8f0808121935k3215669dndf5c89946f136088@mail.gmail.com> <48A24B50.9000301@gmail.com> <41143d8f0808122013l46a79916ub123bf6617dbc7c2@mail.gmail.com> Message-ID: <41143d8f0808122035h4bbb61a9n1185b98526e62ecb@mail.gmail.com> 现在用这个 use XML::Simple; my $xs1 = new XML::Simple; my $doc = $xs1->XMLin($file); 读入是正常的,但中文print出来就不对了,binmode(STDOUT, ":utf8")更不行。。。 -- >: ~ -------------- next part -------------- An HTML attachment was scrubbed... URL: From wanliyou at gmail.com Tue Aug 12 23:53:22 2008 From: wanliyou at gmail.com (Question) Date: Wed, 13 Aug 2008 14:53:22 +0800 Subject: [PerlChina] =?gb2312?b?WE1M1tDOxA==?= In-Reply-To: <41143d8f0808122035h4bbb61a9n1185b98526e62ecb@mail.gmail.com> References: <41143d8f0808121935k3215669dndf5c89946f136088@mail.gmail.com> <48A24B50.9000301@gmail.com> <41143d8f0808122013l46a79916ub123bf6617dbc7c2@mail.gmail.com> <41143d8f0808122035h4bbb61a9n1185b98526e62ecb@mail.gmail.com> Message-ID: <41143d8f0808122353k286aff7ekc7f1847f18fecd97@mail.gmail.com> 各位不吝赐教啊,难道真要自己搞regexp? DOM不支持中文。。。这个匪夷所思啊。 2008/8/13 Question > 现在用这个 > use XML::Simple; > my $xs1 = new XML::Simple; > > my $doc = $xs1->XMLin($file); > > 读入是正常的,但中文print出来就不对了,binmode(STDOUT, ":utf8")更不行。。。 > > > -- > >: ~ > -- >: ~ -------------- next part -------------- An HTML attachment was scrubbed... URL: From fayland at gmail.com Tue Aug 12 23:57:51 2008 From: fayland at gmail.com (Fayland Lam) Date: Wed, 13 Aug 2008 14:57:51 +0800 Subject: [PerlChina] =?gb2312?b?WE1M1tDOxA==?= In-Reply-To: <41143d8f0808122353k286aff7ekc7f1847f18fecd97@mail.gmail.com> References: <41143d8f0808121935k3215669dndf5c89946f136088@mail.gmail.com> <48A24B50.9000301@gmail.com> <41143d8f0808122013l46a79916ub123bf6617dbc7c2@mail.gmail.com> <41143d8f0808122035h4bbb61a9n1185b98526e62ecb@mail.gmail.com> <41143d8f0808122353k286aff7ekc7f1847f18fecd97@mail.gmail.com> Message-ID: <48A285EF.9010905@gmail.com> Question wrote: > 各位不吝赐教啊,难道真要自己搞regexp? > DOM不支持中文。。。这个匪夷所思啊。 > binmode STDOUT 是不中的,要去把原始的 $file 转为 utf8 否则得用 Encode 来转为 utf8 > 2008/8/13 Question > > > 现在用这个 > use XML::Simple; > my $xs1 = new XML::Simple; > > my $doc = $xs1->XMLin($file); > > 读入是正常的,但中文print出来就不对了,binmode(STDOUT, ":utf8") > 更不行。。。 > > > -- > >: ~ > > > > > -- > >: ~ > ------------------------------------------------------------------------ > > _______________________________________________ > China-pm mailing list > China-pm at pm.org > http://mail.pm.org/mailman/listinfo/china-pm -- Fayland Lam // http://www.fayland.org/ Foorum based on Catalyst // http://www.foorumbbs.com/ From wanliyou at gmail.com Wed Aug 13 00:24:32 2008 From: wanliyou at gmail.com (Question) Date: Wed, 13 Aug 2008 15:24:32 +0800 Subject: [PerlChina] =?gb2312?b?WE1M1tDOxA==?= In-Reply-To: <48A285EF.9010905@gmail.com> References: <41143d8f0808121935k3215669dndf5c89946f136088@mail.gmail.com> <48A24B50.9000301@gmail.com> <41143d8f0808122013l46a79916ub123bf6617dbc7c2@mail.gmail.com> <41143d8f0808122035h4bbb61a9n1185b98526e62ecb@mail.gmail.com> <41143d8f0808122353k286aff7ekc7f1847f18fecd97@mail.gmail.com> <48A285EF.9010905@gmail.com> Message-ID: <41143d8f0808130024s60b176ccs42c2bd926063386e@mail.gmail.com> 2008/8/13 Fayland Lam > > binmode STDOUT 是不中的,要去把原始的 $file 转为 utf8 > 否则得用 Encode 来转为 utf8 > XML::Encoding / Unicode::String? perl做web的这么多,能不能给个例子? 也就是说自己读数据再encoding utf8,不用parseFile 用 parse/parsestring? -- >: ~ -------------- next part -------------- An HTML attachment was scrubbed... URL: From fayland at gmail.com Wed Aug 13 00:28:53 2008 From: fayland at gmail.com (Fayland Lam) Date: Wed, 13 Aug 2008 15:28:53 +0800 Subject: [PerlChina] =?gb2312?b?WE1M1tDOxA==?= In-Reply-To: <41143d8f0808130024s60b176ccs42c2bd926063386e@mail.gmail.com> References: <41143d8f0808121935k3215669dndf5c89946f136088@mail.gmail.com> <48A24B50.9000301@gmail.com> <41143d8f0808122013l46a79916ub123bf6617dbc7c2@mail.gmail.com> <41143d8f0808122035h4bbb61a9n1185b98526e62ecb@mail.gmail.com> <41143d8f0808122353k286aff7ekc7f1847f18fecd97@mail.gmail.com> <48A285EF.9010905@gmail.com> <41143d8f0808130024s60b176ccs42c2bd926063386e@mail.gmail.com> Message-ID: <48A28D35.7070005@gmail.com> Question wrote: > 2008/8/13 Fayland Lam > > > > binmode STDOUT 是不中的,要去把原始的 $file 转为 utf8 > 否则得用 Encode 来转为 utf8 > > > XML::Encoding / Unicode::String? > perl做web的这么多,能不能给个例子? > > 也就是说自己读数据再encoding utf8,不用parseFile 用 parse/parsestring? > open(my $fh, '<', $file); local $/ = undef; my $string = <$fh>; close($fh); use Encode qw/from_to/; from_to($string, 'gb2312', 'utf8'); use XML::Simple; my $xs1 = new XML::Simple; my $doc = $xs1->XMLin($string); 试试。 > -- > >: ~ > ------------------------------------------------------------------------ > > _______________________________________________ > China-pm mailing list > China-pm at pm.org > http://mail.pm.org/mailman/listinfo/china-pm -- Fayland Lam // http://www.fayland.org/ Foorum based on Catalyst // http://www.foorumbbs.com/ From wanliyou at gmail.com Wed Aug 13 00:50:58 2008 From: wanliyou at gmail.com (Question) Date: Wed, 13 Aug 2008 15:50:58 +0800 Subject: [PerlChina] =?gb2312?b?WE1M1tDOxA==?= In-Reply-To: <48A28D35.7070005@gmail.com> References: <41143d8f0808121935k3215669dndf5c89946f136088@mail.gmail.com> <48A24B50.9000301@gmail.com> <41143d8f0808122013l46a79916ub123bf6617dbc7c2@mail.gmail.com> <41143d8f0808122035h4bbb61a9n1185b98526e62ecb@mail.gmail.com> <41143d8f0808122353k286aff7ekc7f1847f18fecd97@mail.gmail.com> <48A285EF.9010905@gmail.com> <41143d8f0808130024s60b176ccs42c2bd926063386e@mail.gmail.com> <48A28D35.7070005@gmail.com> Message-ID: <41143d8f0808130050i7451d6f2x7c3fee1c7c108e6b@mail.gmail.com> > open(my $fh, '<', $file); > local $/ = undef; > my $string = <$fh>; > close($fh); > use Encode qw/from_to/; > from_to($string, 'gb2312', 'utf8'); > use XML::Simple; > my $xs1 = new XML::Simple; > my $doc = $xs1->XMLin($string); > > 试试。 > > use Encode qw/from_to/; from_to($string, 'gb2312', 'utf8'); 试了一下XML::Parser也可以了,多谢。 对于本身就是UTF8的再from_to一次会有问题,怎么判断$string是否utf8?直接让它parse一遍,不行的话再encode? 另外,我在中文字符外面套了一层,按说DOM/Parser应该不会去管里面是什么内容? -- >: ~ -------------- next part -------------- An HTML attachment was scrubbed... URL: