[PerlChina] 【请教】perl提取网页TXT时候的乱码问题

Dongxu Ma dongxu.ma at gmail.com
Sat Apr 14 23:38:42 PDT 2007


yes, your input stream was encoded as GB2312.

在07-4-15,zongzi <honghunter at gmail.com> 写道:
>
> 就是说需要强制作一次转码才行?
>
> 在 07-4-13,Dongxu Ma<dongxu.ma at gmail.com> 写道:
> > `iconv -f GB2312 -t UTF8 p0.html' showed me Chinese inside that html,
> > which means while reading from html in your script, you need to decode
> > from GB2312. By something like:
> >
> > 1. Encode::decode("GB2312", <INPUT>)
> > 2. bindmode INPUT, ":encoding('GB2312')"
> >
> > 在07-4-13,zongzi <honghunter at gmail.com> 写道:
> > > 编辑器我用的UltraEdit32。
> > >
> > > 网页编码都是<meta http-equiv="Content-type" content="text/html;
> > > charset=gb2312"/>,还需要什么转换吗?
> > >
> > >
> > > 在 07-4-13,Beckheng Lam<beckheng at perlchina.org> 写道:
> > > > 是不是跟gbk或者utf8有关?
> > > >
> > > > 缘起和合 wrote:
> > > > 什么编辑器干的?确实很乱,用VIM吧
> > > >
> > > > On 4/12/07, zongzi <honghunter at gmail.com> wrote:
> > > > >
> > > >
> > 为了方便放在PDA上面看小说,我用wget把网页(是新浪读书频道)下载到本机,然后用perl去把其中的正文提取出来。
> > > > >
> > > > > 发现弄出来的txt文档中,有好多乱码(用记事本打开看的时候)。
> > > > >
> > > > > 请教大家怎么才能解决?
> > > > >
> > > > >
> > > > > 附件是我的代码,写的非常乱。真是不好意思了。
> > > > >
> > > > > --
> > > > > 这是一个有钱人的世界,与我的世界截然不同!
> > > > >
> > > > > _______________________________________________
> > > > > China-pm mailing list
> > > > > China-pm at pm.org
> > > > > http://mail.pm.org/mailman/listinfo/china-pm
> > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > ------======Nerazzurri======------
> > > > ________________________________
> > > >
> > > _______________________________________________
> > > China-pm
> > > > mailing
> > > > list
> > > China-pm at pm.org
> > > http://mail.pm.org/mailman/listinfo/china-pm
> > > >
> > > > _______________________________________________
> > > > China-pm mailing list
> > > > China-pm at pm.org
> > > > http://mail.pm.org/mailman/listinfo/china-pm
> > > >
> > >
> > >
> > > --
> > > 这是一个有钱人的世界,与我的世界截然不同!
> > > _______________________________________________
> > > China-pm mailing list
> > > China-pm at pm.org
> > > http://mail.pm.org/mailman/listinfo/china-pm
> >
> >
> >
> > --
> > cheers,
> > -dongxu
> > __END__
> > http://search.cpan.org/~dongxu
> > _______________________________________________
> > China-pm mailing list
> > China-pm at pm.org
> > http://mail.pm.org/mailman/listinfo/china-pm
> >
>
>
> --
> 这是一个有钱人的世界,与我的世界截然不同!
> _______________________________________________
> China-pm mailing list
> China-pm at pm.org
> http://mail.pm.org/mailman/listinfo/china-pm




-- 
cheers,
-dongxu
__END__
http://search.cpan.org/~dongxu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.pm.org/pipermail/china-pm/attachments/20070415/20722ce7/attachment.html 


More information about the China-pm mailing list