[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Errata + a reference for today's class
--There was an error in the way I showed the computation of jaccard
similarity for documents d1 and d2 (I took union in terms of sum
rather than max of the keyword frequencies). I corrected it in the slides
--Also, for duplicate (plagiarism) detection, see
http://www-db.stanford.edu/~shiva/Pubs/DlMag/dlmag.html for a short
article on a pretty good system called SCAM
Rao