[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

*IMPORTANT* Wednesday will be a discussion oriented class...




For wednesday's class you should *read*  the Google paper on its
specific indexing/retrieval strategies, and the Clustering paper (by
Haveliwalah et. al.). We can then see how interactive we can make the
class.

My idea is that I will still lecture but will do it only in answer to
the questions raised by you..

In reading Google paper, concentrate on what sorts of corners they cut 
to make ranking more efficient. For example:

 --Are there ways in which they are making it easier to give first few 
answers? 
 --Are there tricks they are using to make the building of the
inverted indices faster?
 --Are there web-specific things they are doing in constructing
inverted indices?
 --Does google do weight normalization? 
 --Does it to idf weighting? 

==If you understand it all, they I will not have to discuss this too
much in the class.


In reading the clustering paper, understand:

 1. the notion of making clusters using nearest neighbor algorithms
    and how similarity metrics help in making clusters.

 2. Compare the specific similarity metric used in that paper to the
    cosine metric

 3. Truly understand why it is hard to do full pairwise comparison of
    document similarities

 4. Understand the differences between normal hashing and LSH idea
     --specifically understand the role of multiple signatures for a
        document 
     --Get a feel for what it means to keep the false-hit probability
        low enough when you are starting with a large sized databases.


***CAVEAT 
I reserve the right to randomly call on people from the class roster
(with perhaps a wee-bit bias towards the group of folks who I know
get a good afternoon's sleep in every class ;-) and ask them to answer 
these questions. I also reserve the right to use the answers to figure 
out partial class-participation credit.. no kidding.


Rao
[Mar  5, 2001]