pointers from yesterday's class

The work on detection of copyright violations on the web (which
involves fast checking for replication/duplication):

 (an older and shorter version at
     http://www-db.stanford.edu/~shiva/Pubs/web.ps while a 
     full book-sized report is at http://www-db.stanford.edu/~shiva/thesis.html)

A commerical web-based  plagiarism detection company 



Spamassassin--the spam detection software that relies--among
otherthings--on fuzzy signatures, central spam repositories etc. 


(This is the one I use)

The quote regarding fudge parameters (of which Google has aplenty):

Freeman Dyson writes in Nature 11 Jan 2004:

In desperation I asked Fermi whether he was not
impressed by the agreement between our calculated
numbers and his measured numbers. He replied,
"How many arbitrary parameters did you use for your
calculations?" I thought for a moment about our cut-off
procedures and said, "Four." He said, "I remember my
friend Johnny von Neumann used to say, with four
parameters I can fit an elephant, and with five
I can make him wiggle his trunk." With that, the
conversation was over.


[Feb 25, 2004]