[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Scale-free networks and small world phenomena (and ZIff's law)
Now that I wound up spending a significant time at the beginning of the
class talking about the connectivity of the webgraph, I should
mention to you two important things:
Scale-free networks and Small-world phenomena
Small-world phenomena are essentially the phenomena underlying kevin bacon
game and six-degrees of separation (the latter says that given any two
random people po and pn in a human population, we can find a sequence of
six people p1...pk for small number k (often around 6) such that the p0
knows p1, who know p2, who knows p3 ..etc. and pk knows pn).
Small-world phenomena happen to be an innate property of random networks
(i.e., networks where the connectivity of nodes is random) that should be
no-more surprising than the fact that given a room with about 20 people,
there is more than 50% chance that at least two of them will have the
common-birthday (and the probability becomes more than .99 if there are
more than 60 people in the room ---see
http://en.wikipedia.org/wiki/Birthday_paradox )
The fact that the probability of two randomly selected web pages are
connected is around .35 is _related to_ the small-world phenomenon.
----------
At first, when people started thinking of web-graphs, they assumed that
these will be random networks. However, they quickly found that in reality,
the web-graphs are not randomly connected at all--and that they tend to
have relatively few nodes of very high connectivity (e.g. your yahoos and
cnns). Such networks are termed "Scale-free networks" (see
http://en.wikipedia.org/wiki/Scale-free_network ; also
http://www.computerworld.com/networkingtopics/networking/story/0,10801,75539,00.html
for a short description ).
Turns out that small-world phenomena can occur in both scale-free and
purely random networks, but scale-free networks tend to be more brittle
than random ones (take out a couple of highly connected nodes and you can
drastically reduce the connectivity of the network!)
Now here is the interesting part--for a long time sociologists,
epedemiologists etc would consider human connectivity networks to be
essentially modelable as random networks. Now, more and more they realize
that scale-free networks are much better models for these (as an example,
the best way of stopping the spread of a communicable epidemic may be to
"neutralize" some of the highly connected individuals in the group (for a
more concrete image, replace "communicable epidemic" with a sexually
transmitted disease, and "highly connected with" "highly sexually connected"))
--------------
The phenomena of a few nodes dominating connectivity in scale-free networks
is related to the phenomena of the popularity of the keyword searches. If
you were to rank the specific keyword searches submitted to Google or any
other search engine, you will find that the cardinal popularity falls
pretty drastically (e.g. number 1 most popular keyword may be getting a
billion hits, while the number 10 most popular may be getting less than a
100 hits). This type of "popular boys/girls get all the dates" phenomenon
are modeled in terms of Zipfian distributions
(where the nth most popular keyword gets hits that are inversely
proportional to n).
http://en.wikipedia.org/wiki/Zipf's_law
All of these have important ramifications to how web and web search work...
nifty, eh?
Rao