[sf-perl] Testing a web crawler
Michael Friedman
friedman at highwire.stanford.edu
Wed Jan 2 15:02:03 PST 2008
Unfortunately (for you, fortunately for me) I didn't write the code that
pulls the citation reference data out of the input files. So I can't speak
to either of the modules you mention. The input we receive is in
well-tagged XML, so the producer of the files separates the parts out for
us anyway. :-)
All I had to do was the matching of data from reference to citation and
v.v. within the database -- which is hard enough.
However, I thank you for the reference to the modules! I know some folks
who'd love to have a more generic way to grab citation information from
flat text...
Has anyone else attacked this problem?
-- Mike
On Wed, 2 Jan 2008, Christian Storm wrote:
>
> On Dec 30, 2007, at 2:47 PM, Michael Friedman wrote:
>> I haven't done work on search engines, but I do work with a journal
>> reference <-> journal citation matching algorithm that has to perform
>> similar discrimination between "good" and "not quite good enough"
>> matches.
>
> Do you use any of the Citation::Biblio or ParaTools::Citation modules to
> do the citation parsing/matching? I was interested in doing this also
> but didn't know
> if these were my best bet.
> _______________________________________________
> SanFrancisco-pm mailing list
> SanFrancisco-pm at pm.org
> http://mail.pm.org/mailman/listinfo/sanfrancisco-pm
>
---------------------------------------------------------------------
Michael Friedman <mfriedman at stanford.edu>
---------------------------------------------------------------------
More information about the SanFrancisco-pm
mailing list