[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Eucledian, Cosine and Jaccard similarity messure
Hi all,
In the course slides there is an example to show that Eucledian distance
performs less effectively in seperating the documents (slide #58). Please note
that this happens partly because the regular Eucledian distance method does
not have normalization on the vectors, while cosine distance take the norm of
the vectors into account and thus get rid of the influence of the size of the
vectors.
Acctually if we normalize the vectors before we calculate Eucledian distance,
it performs almost as good as cosine distance.
On the other hand, Jaccard similarity performs reasonably well, normalized or
not (by its definition, the length of the vectors are indirectly taken into
account already...), although it looks that in my testing the cosine
similarity is doing a little better.
Please see the attached picture for a comparison of the dicussion above. The
example uses the same 10 documents as used in the course slides.
Cheers,
Jianchun