[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Google Linux Cluster talk by Urs Hoelzle--Media player archive...
So not having much homework or project stuff hanging on my head, I just
completed watching the Hoelzle talk.
Shows how you can talk for 50 minutes without giving away any of the
companies real secrets ;-)
Here are some random notes:
--He tries a sort of funny "flow" based explanation of page rank (which
doesnt quite work and he has to revert to the random surfer model anyway).
--The main talk is about how they use cheap PCs in clusters for hardware
support. The technical part that you can take away is that due to the
read-only nature, and multiple independent queries of websearch engines, it
is quite easy to parallelize the heck out of the problem. An incoming
query is switched to one of N different machine clusters using a fast
switcher/load balancer, and is then taken over by that cluster. There are
nice photos of these machine clusters.
--They go with cheap PCs as their workhorses--and deal with machine
failure by a lots and lots of replication (of the index and document
servers). Much of the talk is a sort of justification of why this works out
well for Google $ wise.
--Couple cute jargon words: Sharding--spanning a large file over multiple
systems
--Some interesting observations on scale--if you have a disk that is rated
for 250,000 hours mean time between failures, and have a 50,000 disks you
expect a disk failure every 5 hours (this sort of scale arguments also come
out in the Haveliwala global clustering paper).
--He gives some "Google" perspectives on what is good research and what is
bad research on search engines. User modeling and adaptive search engines
are considred good, while semantic web, deep web and P2P are considered bad
(the arguments are so-so).
G'night
Rao