[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Clarification on K-medoid algorithm
- To: cse494-s07@parichaalak.eas.asu.edu
- Subject: Clarification on K-medoid algorithm
- From: "Subbarao Kambhampati" <rao@asu.edu>
- Date: Fri, 9 Mar 2007 13:30:51 -0700
- Dkim-signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:sender:to:subject:mime-version:content-type:x-google-sender-auth; b=SbZsOatbgFg8hI2WeI8Tn7ILWkFcaePyLKqsX6oTNnloHFTbzmqKpRCkM26W+iXCvjaFPxVKUWGesnKi+No/Gaf+C54wkKHlz0QLL+ykB8fQJoaFF3paEEtyvskSi2FsW2eOJoCe3izReHXMyjNxIR3P1CJ70Z+dORvvD5tLDQE=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:sender:to:subject:mime-version:content-type:x-google-sender-auth; b=EfQEH0N5Lse4kgsluQtKyYkJfKmR2r3bCsf9+pkmRxZloxbl7tqf10tsyvOR2SrSUTkE5gDQoF990lV3DAuDACwl8ytp7V/Xnu6ZBivR2Z7t02dv0uX+TjBEKxzKAN7sHycB9h5/tkSGRpTQoW0mvnQChXzHWotpAxw94YueW5o=
- Sender: subbarao2z2@gmail.com
Regarding the discussion on K-medoid algorithm, the standard idea seems to be to
consider one of the data points in the cluster as the medoid (so it is the most representative
datapoint). The way the most representative data point is chosen may vary, but the most reasonable
idea (that is resistant to outliers) is to pick the point that has the lowest cumulative distance to all other
points. Since this is a costly operation, sometimes it is done only on a sample of the points in the cluster.
Here is the algorithm:
Basic K-medoid Algorithm:
1. Select K points as the initial medoids.
2. Assign all points to the closest medoid.
3. See if any other point is a "better" medoid (i.e, has the lowest average distance to all other points)
Finding a better medoid involves comparing all pairs of medoid and non-medoid points and is relatively inefficient
.–Sampling may be used.
4. Repeat steps 2 and 3 until the medoids don't change.
Rao
ps: Here is a paper that does use this type of clustering:
http://coblitz.codeen.org:3125/citeseer.ist.psu.edu/cache/papers/cs/27046/http:zSzzSzwww-faculty.cs.uiuc.eduzSz~hanjzSzpdfzSzvldb94.pdf/ng94efficient.pdf