[kansaipm] HTML-Parser
Kiyoka Nishiyama
kiyoka at sa.uno.ne.jp
Fri Mar 24 10:05:51 CST 2000
kiyokaです。
みなさんこんばんは。
CPANから HTML-Parser-3.07 をとってきて使ってみました。
やっぱりパーサーなんかは自分で書かずにさっさと
CPANを利用するべしだと思いました。
次の WhatsNew に入れる予定のフィルタ(まだ未完成)です。
このフィルタを使うと、タグとテキストを行単位に分割できます。
つまり
<A HREF="mailto:kiyoka at sa.uno.ne.jp"> email: Kiyoka Nishiyama </A>
が
<A HREF="mailto:kiyoka at sa.uno.ne.jp">
email: Kiyoka Nishiyama
</A>
となります。
よって、 diff を使って比較した場合でもタグの途中でわかれてしまったり
もしないし、Perlの行指向の処理にもマッチします。
これで、なんとか TODO のうち 4 つほどをうまく解決できそうです。
ご参考まで。
------------------------------ start ------------------------------
#!/usr/bin/perl -w
#
# "What's New" is display tool that produces difference between two versions of Website.
# Copyright (C) 1999,2000 Kiyoka Nishyama
# $Date: 2000/03/20 14:47:25 $
#
# This file is part of WhatsNew
#
# WhatsNew is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2, or (at your option)
# any later version.
#
# WhatsNew is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with WhatsNew; see the file COPYING.
#
#
require 5.003;
use English;
use HTML::Parser ();
use strict 'vars';
sub tag {
my( $tagname, $deeper, $pos, $text ) = @_;
print $text, "\n";
}
sub decl { print shift; }
sub text { print shift, "\n"; }
HTML::Parser->new(api_version => 3,
start_h => [\&tag, "tagname, '+1', tokenpos, text"],
end_h => [\&tag, "tagname, '-1', undef, text"],
process_h => [\&text, "text"],
comment_h => [\&text, "text"],
declaration_h => [\&text, "text"],
default_h => [\&text, "text"],
)
->parse_file(shift) || die "Can't open file: $!\n";
------------------------------ end ------------------------------
regards,
+---
Kiyoka Nishiyama <kiyoka at sa.uno.ne.jp>
http://www.netfort.gr.jp/~kiyoka/
More information about the Kansai-pm
mailing list