[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
pointers from yesterday's class
The work on detection of copyright violations on the web (which
involves fast checking for replication/duplication):
http://www-db.stanford.edu/~cho/papers/cho-mirror.pdf
(an older and shorter version at
http://www-db.stanford.edu/~shiva/Pubs/web.ps while a
full book-sized report is at http://www-db.stanford.edu/~shiva/thesis.html)
A commerical web-based plagiarism detection company
http://www.plagiarism.org/
----------
Spamassassin--the spam detection software that relies--among
otherthings--on fuzzy signatures, central spam repositories etc.
http://www.spamassassin.org/index.html
(This is the one I use)
--------------
The quote regarding fudge parameters (of which Google has aplenty):
Freeman Dyson writes in Nature 11 Jan 2004:
In desperation I asked Fermi whether he was not
impressed by the agreement between our calculated
numbers and his measured numbers. He replied,
"How many arbitrary parameters did you use for your
calculations?" I thought for a moment about our cut-off
procedures and said, "Four." He said, "I remember my
friend Johnny von Neumann used to say, with four
parameters I can fit an elephant, and with five
I can make him wiggle his trunk." With that, the
conversation was over.
http://www.nature.com/cgi-taf/DynaPage.taf?file=/nature/journal/v427/n6972/full/427297a_fs.html
rao
[Feb 25, 2004]