[mplspm]: finding plagerism

Jay Jacobs jay at lach.net
Wed Mar 13 08:40:18 CST 2002


It sounds like it may help to look at the Lingua::* modules, They do a
number of text/english processing, comparing and conversions.  Also you
may want to look at String::Approx for doing Levenshtein matching and/or
the Soundex stuff (bit of a stretch).

I think Levenshtein matching could work on complete phrases or sentences
that could be pulled out with Lingua::EN::Sentence.

Hope it helps,
Jay


On Tue, 12 Mar 2002, Dan Oelke wrote:

>
> I teach a network communications class at a local university by night,
> and by day I hack perl code to automate lots of good stuff.
>
> After getting fed up with people copying whole paragraphs from one
> another, the book, or web sites, I decided why not write a quicky perl
> script to compare phrases from their submissions with a library of
> documents that I have.  I have identified the top couple of sites that
> the like to copy from (heck I use them for my own research) and so I
> can use a robot to copy down that content for my search purposes.
>
> What I am thinking of is something very much like turnitin.com but
> without actually using their service.  Yes I am cheap - but more
> importantly I think it is a cool project.
>
> What I am looking for are any good ideas of existing modules that might
> help me here.  I have looked through CPAN and haven't found anything
> off hand, but maybe I'm not using the right search terms.
>
> I guess I need two things - one a parsing engine to parse out key
> phrases - 4 to 8 words in length I am guessing, and then a search
> mechanism that works on these phrases.
>
> I have some ideas on the phrase engine - such as ignoring common words
> like "A", "An", "the", "I", etc. - maybe it should just ignore all 1-3
> letter words.
>
> Any other ideas are appreciated.  Is there one of the search/matching
> modules that might work better than others?  If I can't find something
> I'll probably write it and put it out as my first real module of
> something I can actually release.
>
> Thanks,
> Dan
>
>
> --------------------------------------------------
> Minneapolis Perl Mongers mailing list
>
> To unsubscribe, send mail to majordomo at pm.org
> with "unsubscribe mpls" in the body of the message.
>




--------------------------------------------------
Minneapolis Perl Mongers mailing list

To unsubscribe, send mail to majordomo at pm.org
with "unsubscribe mpls" in the body of the message.



More information about the Mpls-pm mailing list