karic at lclark.edu
Mon Feb 18 18:13:04 CST 2002
A little endorsement of Perl, from the inventor of the 100+ terabyte
WayBack Machine (archive.org)
Koman: What are the crawlers written in?
Kahle: Combinations of C and Perl. Almost everything we can, we do in
Perl -- for ease of portability, maintability, flexibility. Because
there's so much horsepower we don't really require a tight system. The
crawlers record pages into 100MB files in a standard archive file
format, and then store it on one of the storage machines. Those are
just normal PCs with four IDE hard drives, and its just writes along
until it's filled up and then it goes to the next one. It goes through
a couple of these machines a day: hundreds of gigabytes a day. The
total gathering speed when everything is moving is about 10 terabytes
a month, or half a Library of Congress a month.
More information about the Pdx-pm-list