[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Google Linux Cluster talk by Urs Hoelzle--Media player archive...
Folks:
Here is a seemingly quite interesting talk that was given at UW
Seattle on November 5th on Google and how it manages its linux
clusters--something I am sure you will enjoy hearing about as you
complete the project...
The mediaplayer link is:
http://www.cs.washington.edu/info/videos/asx/colloq/UHoelzle_2002_11_05.asx
(you need brother gates' windows media player and a good fast internet
connection)
Here is the abstract:
Urs Hoelzle (Google)
The Google Linux Cluster
Abstract
Google's Linux cluster currently processes over 150 million
queries per day, searching a multi-terabyte web index for every query
with an average response time of less than a quarter of a second with
near-100% uptime. In this talk I'll describe the software and hardware
infrastructure that makes this performance possible.
I will start with an overview of the main problems facing a web search
engine, and discuss Google's PageRank algorithm, which helps it to
frequently return the right results on the first results page. PageRank
is computed with a large-scale off-line computation over the web's link
graph which models the behavior of a random web surfer.
Google's software architecture harnesses the power of thousands of cheap
Linux PCs and organizes them into a scalable, reliable, high-performance
computing system. At the same time, we aim to keep the architecture as
simple as possible. Our solution structures the system as a collection of
TCP-and UDP-based servers and guarantees reliability via replication of
servers combined with software load balancing and failover.
On the hardware side, the main goals are performance and cost;
reliability explictly isn't a goal, since that requirement is provided by
software. We use custom-built rackmount systems assembled from standard
PC components, ensuring volume availability and competitive pricing. A
compact rack design minimizes colocation space costs but pushes the
envelope of commercially supportable power density.