Qn 1.3. Consider the following simple IR situation. We have two keywords k1 and k2, and three documents d1, d2 and d3. Using some weighting mechanism, we have come up with the following term-document matrix M d1 d2 d3 k1 .25 .53 .75 k2 .73 .5 .23 (notice that all weights are less than 1). Answer the following questions: 1. Do the singular value decomposition of M. Notice that this involves computing M*M' (where M' is the transpose of M)and M'*M, computing the eigen values and eigen vectors of these matrices, and getting U, S and V from this computation. [[Oct 6, 2008] Note that M here is t-d so the SVD of M will be of the form t-d = t-f * f-f * d-f' So, U=t-f, S=f-f and V is d-f] Even if you know how to use Matlab etc, I would suggest that you do the problem by hand. You can do this easily enough for this problem if you follow the refresher handouts on linear algebra. Since the eigen values of M'*M and M*M' are the same, you may want to do the computation just for M*M' (because this will be a simple 2-D matrix!). 2. Ignore the smaller of the two singular values and reconstruct M. Does the reconstruction look close enough to the original M? 3. If I have a query with one keyword k1. Explain how this query will be represented in the LSI space. Use the cosine similarity metric to rank the three documents. 4. Attempt a geometric intepretation of what you have done. Specifically, plot the documents as vectors in the k1-k2 space. Explain how the axes shifted after the singular value decomposition. Explain if the shift of axes helped in reducing the dimensionality of the data. 5. Suppose in the very first part, we had another keyword k3 such that the value of k3 for the documents d1,d2,d3 is 0.49, 0.515 and 0.49 respectively. Explain briefly how your answers to question 1-4 would have changed (note: the answer is simpler than actually redoing everything!). What does this tell you about LSI?