[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Google Linux Cluster talk by Urs Hoelzle--Media player archive...



Folks:

 Here is a seemingly quite interesting talk that was given at UW Seattle on November 5th on Google and how it manages its linux clusters--something I am sure you will enjoy hearing about as you complete the project...

The mediaplayer link is:

http://www.cs.washington.edu/info/videos/asx/colloq/UHoelzle_2002_11_05.asx

(you need brother gates' windows media player and a good fast internet connection)

Here is the abstract:

Urs Hoelzle (Google)
The Google Linux Cluster

Abstract
Google's Linux cluster currently processes over 150 million queries per day, searching a multi-terabyte web index for every query with an average response time of less than a quarter of a second with near-100% uptime. In this talk I'll describe the software and hardware infrastructure that makes this performance possible.
I will start with an overview of the main problems facing a web search engine, and discuss Google's PageRank algorithm, which helps it to frequently return the right results on the first results page. PageRank is computed with a large-scale off-line computation over the web's link graph which models the behavior of a random web surfer.
Google's software architecture harnesses the power of thousands of cheap Linux PCs and organizes them into a scalable, reliable, high-performance computing system. At the same time, we aim to keep the architecture as simple as possible. Our solution structures the system as a collection of TCP-and UDP-based servers and guarantees reliability via replication of servers combined with software load balancing and failover.
On the hardware side, the main goals are performance and cost; reliability explictly isn't a goal, since that requirement is provided by software. We use custom-built rackmount systems assembled from standard PC components, ensuring volume availability and competitive pricing. A compact rack design minimizes colocation space costs but pushes the envelope of commercially supportable power density.