SPUG: table scraper server? HTML -> XML

Fred Morris m3047 at inwa.net
Mon Jul 19 07:37:45 CDT 2004


Before I reinvent the wheel, is anybody out there running a publicly
accessible table scraper for web pages? I'm looking for something which
would take a URL as a CGI parameter and turn something like this:

<body>
<p>hello world</p>
<table>
 <tr><td>one</td><td>1</td></tr>
 <tr><td>two</td><td>2</td></tr>
</table>
<p>really difficult stuff!</p>
<table>
 <tr><td>apple</td><td>green</td></tr>
 <tr><td>lemon</td><td>yellow</td></tr>
</table>
<p>and so forth</p>
</body>

into something like this:

<table1>
  <col1>one</col1><col2>1</col2>
</table1>
<table1>
  <col1>two</col1><col2>2</col2>
</table1>
<table2>
  <col1>apple</col1><col2>green</col2>
</table2>
<table2>
  <col1>lemon</col1><col2>yellow</col2>
</table2>

Again, my primary question is about a *server*. I haven't checked CPAN yet
to see if there's anything particularly useful if I end up rolling my own.

(BTW, if I do roll my own, there's a fair chance it will be publicly
accessible to some extent.)

--

Fred Morris
m3047 at inwa.net





More information about the spug-list mailing list