[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

pointers from yesterday's class



The work on detection of copyright violations on the web (which
involves fast checking for replication/duplication):

http://www-db.stanford.edu/~cho/papers/cho-mirror.pdf
 (an older and shorter version at
     http://www-db.stanford.edu/~shiva/Pubs/web.ps while a 
     full book-sized report is at http://www-db.stanford.edu/~shiva/thesis.html)

A commerical web-based  plagiarism detection company 

http://www.plagiarism.org/

----------

Spamassassin--the spam detection software that relies--among
otherthings--on fuzzy signatures, central spam repositories etc. 

http://www.spamassassin.org/index.html

(This is the one I use)


--------------
The quote regarding fudge parameters (of which Google has aplenty):

Freeman Dyson writes in Nature 11 Jan 2004:

In desperation I asked Fermi whether he was not
impressed by the agreement between our calculated
numbers and his measured numbers. He replied,
"How many arbitrary parameters did you use for your
calculations?" I thought for a moment about our cut-off
procedures and said, "Four." He said, "I remember my
friend Johnny von Neumann used to say, with four
parameters I can fit an elephant, and with five
I can make him wiggle his trunk." With that, the
conversation was over.

http://www.nature.com/cgi-taf/DynaPage.taf?file=/nature/journal/v427/n6972/full/427297a_fs.html

rao
[Feb 25, 2004]