[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

On using similarities to compute distances in Qn 5 [Re: Question About Question on Final]



At 05:12 PM 5/8/2001 -0700, you wrote:
>Heya,
>         I've a question about question five, the k-means
>question.  You say to show the cluster dissimilarity measure
>for each iteration, and define that as:
>"the sum of the similarities of docs from their respective
>cluster centers".
>This number increases if the documents in the cluster are
>all very similar to each other, and decreases if they are
>very dissimilar. This does not seem like a dissimilarity
>measure?

Since we are using similarities to represent distances, the general
idea is that distance is inversely proportional to similarity (t the higher 
the similarity, the lower the distance. )

FOr this specific question, either you can give the "aggregate similarity 
measure", which should keep increasing,
or aggregage dissimilarity measure where you define dissimilarity to 
be--say-- 1/(similarity)

Either one will be enough for our purpose.

Rao