[Melbourne-pm] Scraping Media Wiki
alfiejohn at gmail.com
Tue Jan 12 20:01:46 PST 2010
On Wed, Jan 13, 2010 at 2:49 PM, <scottp at dd.com.au> wrote:
> I think it is a three part answer:
> * WWW:Mechanize or even just LWP to get the page
> * XML format may give you some benefits, such as Date modified
> * Then parse the content there are a number of wiki text parsers on CPAN,
> none of them great, but most ok. Converting to HTML may be your best bet, at
> least then it is in HTML table format.
> There are some Mediawiki API classes too, but I have not used them:
> * WWW::Mediawiki::Client
> * MediaWiki::API
I agree with WWW::Mechanize. But if you don't manage to get any of the wiki
parsers going and your data is consistent, you could try Template::Extract.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Melbourne-pm