[Chicago-talk] data mining

David Mertens dcmertens.perl at gmail.com
Thu Sep 22 06:40:30 PDT 2011


Richard, you said:

> Writing a perl program that uses regex to search the web is probably beyond my skills.

Can you elaborate on that? To what would you apply such a regex? Were
you thinking about doing an all-out web crawl, then parsing the output
to find relevant information about a given city? Are you looking for
an existing database that you can tap? (If the latter, Don's
suggestions look pretty good. See p3rl.org/Geo::Coder::SimpleGeo or
p3rl.org/Net::HTTP::Factual.)

If you want an all-out web crawl, generating your data from the web
from scratch, I can imagine putting something together with
WWW::Mechanize to do the crawl. Determining which page has
relevant---and authoritative---information about a city, and then
managing to extract that information, can get very complicated. What
are your resources? What is your timeline? What is your expertise in
data mining?

David


More information about the Chicago-talk mailing list