[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Errata + a reference for today's class




--There was an error in the way I showed the computation of jaccard similarity for documents d1 and d2 (I took union in terms of sum rather than max of the keyword frequencies). I corrected it in the slides


--Also, for duplicate (plagiarism) detection, see http://www-db.stanford.edu/~shiva/Pubs/DlMag/dlmag.html for a short article on a pretty good system called SCAM


Rao