[sf-perl] Testing a web crawler

Michael Friedman friedman at highwire.stanford.edu
Wed Jan 2 15:02:03 PST 2008


Unfortunately (for you, fortunately for me) I didn't write the code that 
pulls the citation reference data out of the input files. So I can't speak 
to either of the modules you mention. The input we receive is in 
well-tagged XML, so the producer of the files separates the parts out for 
us anyway. :-)

All I had to do was the matching of data from reference to citation and 
v.v. within the database -- which is hard enough.

However, I thank you for the reference to the modules! I know some folks 
who'd love to have a more generic way to grab citation information from 
flat text...

Has anyone else attacked this problem?

-- Mike

  On Wed, 2 Jan 2008, Christian Storm wrote:

>
> On Dec 30, 2007, at 2:47 PM, Michael Friedman wrote:
>> I haven't done work on search engines, but I do work with a journal
>> reference <-> journal citation matching algorithm that has to perform
>> similar discrimination between "good" and "not quite good enough"
>> matches.
>
> Do you use any of the Citation::Biblio or ParaTools::Citation modules to
> do the citation parsing/matching?  I was interested in doing this also
> but didn't know
> if these were my best bet.
> _______________________________________________
> SanFrancisco-pm mailing list
> SanFrancisco-pm at pm.org
> http://mail.pm.org/mailman/listinfo/sanfrancisco-pm
>

---------------------------------------------------------------------
Michael Friedman                             <mfriedman at stanford.edu>
---------------------------------------------------------------------



More information about the SanFrancisco-pm mailing list