[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

One more comment on k-means (and mid-term marks)

To: cse494-s01@asu.edu
Subject: One more comment on k-means (and mid-term marks)
From: Subbarao Kambhampati <SUBBARAO.KAMBHAMPATI@asu.edu>
Date: Fri, 23 Mar 2001 11:40:27 -0700 (MST)
Reply-to: rao@asu.edu


The following contains case 1 and case 1' results, this time with the
cluster dissimilarity measure (sum of the deviations of the elements
in each cluster from that cluster's center) noted next to the cluster.
These measures in a way tell you how good the clustering is--the
smaller the measure the better the clustering. 

case 1
USER(218): (k-means mlist 5  :key #'mark-val)

>>>>((61.5) (38) (32) (26) (17.5))
>>>>((61.5 55) (48 47.5 47.5 47.5 38 37 35) (34.5 32.5 32.5 32 30 29) (28 27 27 26 25.5 22.5)
     (20.5 19 18 17.5 17 13.5 13 11.5 9.5 8.5 7 4)) --Dissimilarity Measure:113.07143
>>>>((61.5 55) (48 47.5 47.5 47.5 38) (37 35 34.5 32.5 32.5 32 30 29) (28 27 27 26 25.5 22.5 20.5)
     (19 18 17.5 17 13.5 13 11.5 9.5 8.5 7 4)) --Dissimilarity Measure:97.791214
>>>>((61.5 55) (48 47.5 47.5 47.5) (38 37 35 34.5 32.5 32.5 32 30) (29 28 27 27 26 25.5 22.5 20.5 19)
     (18 17.5 17 13.5 13 11.5 9.5 8.5 7 4)) --Dissimilarity Measure:88.91668
>>>>((61.5 55) (48 47.5 47.5 47.5) (38 37 35 34.5 32.5 32.5 32 30) (29 28 27 27 26 25.5 22.5 20.5 19)
     (18 17.5 17 13.5 13 11.5 9.5 8.5 7 4)) --Dissimilarity Measure:88.91668

case 1'
USER(219): (k-means mlist-r 5  :key #'mark-val)

>>>>((35) (32) (26) (18) (17))
>>>>((61.5 55 48 47.5 47.5 47.5 38 37 35 34.5) (32.5 32.5 32 30 29) (28 27 27 26 25.5 22.5) (20.5 19 18 17.5)
     (17 13.5 13 11.5 9.5 8.5 7 4)) --Dissimilarity Measure:117.0
>>>>((61.5 55 48 47.5 47.5 47.5) (38 37 35 34.5 32.5 32.5 32 30 29) (28 27 27 26 25.5 22.5) (20.5 19 18 17.5 17)
     (13.5 13 11.5 9.5 8.5 7 4)) --Dissimilarity Measure:82.19365
>>>>((61.5 55 48 47.5 47.5 47.5) (38 37 35 34.5 32.5 32.5 32 30) (29 28 27 27 26 25.5 22.5) (20.5 19 18 17.5 17)
     (13.5 13 11.5 9.5 8.5 7 4)) --Dissimilarity Measure:80.37619
>>>>((61.5 55 48 47.5 47.5 47.5) (38 37 35 34.5 32.5 32.5 32) (30 29 28 27 27 26 25.5 22.5) (20.5 19 18 17.5 17)
     (13.5 13 11.5 9.5 8.5 7 4)) --Dissimilarity Measure:78.55476
>>>>((61.5 55 48 47.5 47.5 47.5) (38 37 35 34.5 32.5 32.5 32) (30 29 28 27 27 26 25.5) (22.5 20.5 19 18 17.5 17)
     (13.5 13 11.5 9.5 8.5 7 4)) --Dissimilarity Measure:78.571434
>>>>((61.5 55 48 47.5 47.5 47.5) (38 37 35 34.5 32.5 32.5 32) (30 29 28 27 27 26 25.5) (22.5 20.5 19 18 17.5 17)
     (13.5 13 11.5 9.5 8.5 7 4)) --Dissimilarity Measure:78.571434


You will note 
 1. That dissimilairty measure is reduced from iteration to iteration
    in each run

 2. that the lowest dissimilarity attained depends on the original
    cluster centers. This is a consequence of the fact that K-means is 
    a greedy algorithm and is not finding clusters with globally
    lowest cluster dissimilarities. 

 3. It is nice to see that the clusters found in case 1' are better
    (according to the dissimilarity metric) than those found in case 1 
   (because this means that giving more As is in fact a better
     idea according to k-means ;-)

Here is a little puzzle to ponder over:
 How would you go about making k-means find the globally best cluster
according to the dissimilairty measure?

Rao
[Mar 23, 2001]

Prev by Date: Re: clustering the mid-term marks using k-means
Next by Date: project 1 submission
Prev by thread: Core-less apples and other curiosities of high dimensions
Next by thread: project 1 submission
Index(es):
- Date
- Thread