[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

some issues to think about re: the google paper



If you read the google paper, you should know answers to these ;-)

?Fancy hits?
?Why two types of barrels?
?How is indexing parallelized?
?How does Google show that it doesn?t quite care about recall?
?How does Google avoid crawling the same URL multiple times?
?What are some of the memory saving things they do?
?Do they use TF/IDF?
?Do they normalize? (why not?)
?Can they support proximity queries?
?How are ?page synopses? made?

Rao