[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Self-organizing maps vs. K-Means (optional)



There was a question at the end of the class yesterday on the connection between Self-organizing maps and K-means.

Self-organizing maps can be seen as an online constrained version of K-means algorithm. (Online means the datapoints are processed one at a time as they come in). To see this, let us consider the following
"online" view of K-means.


Suppose we are doing K-means with 3 clusters (k=3). Consider starting with just the three seed centroids c1, c2 and c3 in the space.
Now consider each datapoint in turn,


loop for point  pi from p1...pn
  do
    find cj that is closest to pi
    move cj a small bit towards pi
            i.e.  cj :=   cj +  alpha ( pi -cj ) /*where  0 <= alpha <= 1 */
    od

Notice that in the procedure above, as the datapoints are being processed they are "attracting" the centers of their clusters towards themselves. (normal "batch" k-means resets centroids once an iteration, while the online version tunes the centroid once for each datapoint).
(I will mention next class (in a slide titled K-means variations) that we can try hybrid methods that lie between online and batch by
resetting the centroid after every j points. Such methods typically improve the convergence speed since more data points will see the true centroids of the clusters earlier).



If you pass over the datapoints multiple times in an online fashion, the resulting clustering is basically similar to what a batch K-means would have done starting from the original seed clusters.



SOM can be seen as this online version of K-means, with one additional restriction. The original cluster centers cj are all constrained to be
in a low dimensional manifold (usually 2-d space) of the full data.
This in a way gives a clustering of the data after it has been projected onto a plane (the plane of projection is normally the plane on which the data shows maximum variability--so a generalization of principal component--which is the direction over which the data shows the
maximum variability).


the operation of SOM involves:

select the plane, and dot it uniformly with K cluster centers

for each datapoint, pick the center closest to the datapoint
pull that center *as well as the centers in its neighborhood* a little towards the datapoint


(notice that rather than just pull the single cluster center, we are pulling the neighborhood. this tends to
distort the surface on which the original culster centers lie so that it conforms to the way (the projection of the data)
is actually spread on the initial plane)


SOM has been claimed to provide a "visual representation" of the spread of the data. You can see for yourself whether this is really the case by looking at
http://websom.hut.fi/websom/comp.ai.neural-nets-new/html/root.html


where a whole bunch of articles from comp.ai.neural-nets are organized using SOM.

rao