[kansaipm] HTML-Parser

Kiyoka Nishiyama kiyoka at sa.uno.ne.jp
Fri Mar 24 10:05:51 CST 2000


kiyokaです。
みなさんこんばんは。

CPANから HTML-Parser-3.07 をとってきて使ってみました。
やっぱりパーサーなんかは自分で書かずにさっさと
CPANを利用するべしだと思いました。

次の WhatsNew に入れる予定のフィルタ(まだ未完成)です。
このフィルタを使うと、タグとテキストを行単位に分割できます。
つまり

<A HREF="mailto:kiyoka at sa.uno.ne.jp"> email: Kiyoka  Nishiyama </A>

が

<A HREF="mailto:kiyoka at sa.uno.ne.jp">
 email: Kiyoka  Nishiyama 
</A>

となります。

よって、 diff を使って比較した場合でもタグの途中でわかれてしまったり
もしないし、Perlの行指向の処理にもマッチします。
これで、なんとか TODO のうち 4 つほどをうまく解決できそうです。

ご参考まで。

------------------------------ start ------------------------------
#!/usr/bin/perl -w
#
# "What's New" is display tool that produces difference between two versions of Website.
#   Copyright (C) 1999,2000 Kiyoka Nishyama
#     $Date: 2000/03/20 14:47:25 $
#
# This file is part of WhatsNew
#
# WhatsNew is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2, or (at your option)
# any later version.
# 
# WhatsNew is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
# 
# You should have received a copy of the GNU General Public License
# along with WhatsNew; see the file COPYING.
#
#
require 5.003;
use English;
use HTML::Parser ();
use strict 'vars';

sub tag {
    my( $tagname, $deeper, $pos, $text ) = @_;
    print $text, "\n";
}
sub decl { print shift; }
sub text { print shift, "\n"; }

HTML::Parser->new(api_version   => 3,
		  start_h       => [\&tag,   "tagname, '+1', tokenpos, text"],
		  end_h         => [\&tag,   "tagname, '-1', undef,    text"],
                  process_h     => [\&text,   "text"],
		  comment_h     => [\&text,   "text"],
                  declaration_h => [\&text,   "text"],
                  default_h     => [\&text,   "text"],
                 )
    ->parse_file(shift) || die "Can't open file: $!\n";

------------------------------  end  ------------------------------

regards,
+---
 Kiyoka Nishiyama <kiyoka at sa.uno.ne.jp>
 http://www.netfort.gr.jp/~kiyoka/



More information about the Kansai-pm mailing list