[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Self-organizing maps vs. K-Means (optional)
There was a question at the end of the class yesterday on the connection
between Self-organizing maps and K-means.
Self-organizing maps can be seen as an online constrained version of
K-means algorithm. (Online means the datapoints are processed one at a
time as they come in). To see this, let us consider the following
"online" view of K-means.
Suppose we are doing K-means with 3 clusters (k=3). Consider starting with
just the three seed centroids c1, c2 and c3 in the space.
Now consider each datapoint in turn,
loop for point pi from p1...pn
do
find cj that is closest to pi
move cj a small bit towards pi
i.e. cj := cj + alpha ( pi -cj ) /*where 0 <= alpha <= 1 */
od
Notice that in the procedure above, as the datapoints are being processed
they are "attracting" the centers of their clusters towards themselves.
(normal "batch" k-means resets centroids once an iteration, while the
online version tunes the centroid once for each datapoint).
(I will mention next class (in a slide titled K-means variations) that we
can try hybrid methods that lie between online and batch by
resetting the centroid after every j points. Such methods typically improve
the convergence speed since more data points will see the true centroids of
the clusters earlier).
If you pass over the datapoints multiple times in an online fashion, the
resulting clustering is basically similar to what a batch K-means would
have done starting from the original seed clusters.
SOM can be seen as this online version of K-means, with one additional
restriction. The original cluster centers cj are all constrained to be
in a low dimensional manifold (usually 2-d space) of the full data.
This in a way gives a clustering of the data after it has been projected
onto a plane (the plane of projection is normally the plane on which the
data shows maximum variability--so a generalization of principal
component--which is the direction over which the data shows the
maximum variability).
the operation of SOM involves:
select the plane, and dot it uniformly with K cluster centers
for each datapoint, pick the center closest to the datapoint
pull that center *as well as the centers in its neighborhood* a
little towards the datapoint
(notice that rather than just pull the single cluster center, we are
pulling the neighborhood. this tends to
distort the surface on which the original culster centers lie so that it
conforms to the way (the projection of the data)
is actually spread on the initial plane)
SOM has been claimed to provide a "visual representation" of the spread of
the data. You can see for yourself whether this is really the case by
looking at
http://websom.hut.fi/websom/comp.ai.neural-nets-new/html/root.html
where a whole bunch of articles from comp.ai.neural-nets are organized
using SOM.
rao