[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Clarification on Qn 1, Hw 1



At 01:47 PM 2/11/2001 -0700, you wrote:
>Hello,
>
>I have a question about the homework #1. In second question, part b)
>we are supposed to calculate similarity between query and each of the 3
>documents. My question is about the formula. This formula uses a value of M,
>which is the number of unique terms in C. How can I plug in the values, if M
>is not given? What would be M for query in the weight formula, if we treat
>query as a document?

You are right... We will have to make the (unreasonable) assumption that 
there are only two terms
"Information" and "retrieval". Notice that this assumption only changes the 
denominator of the weights corresponding to both information and retrieval. 
More precisely, even if we knew the statistics about all other keywords in 
the corpus and take them into account, we will change the weights of 
"Information" and "Retrieval" by the same factor (see below)

>What confuses me even more, is that k is used as a variable in calculating the
>sum in weights formula. That is we would have to know the frequency of all
>terms in C that are in document D (or in query).

For the query, we have already said that it contains only the keywords 
information and retrieval. Since the similarity is being computed between 
the query and each of the documents, the only part of the dot product with 
non-zero coefficients will be those corresponding to "information" and 
"retrieval" (even if other keywords are present in each of the documents, 
they are not present in the query!).

>My last question is about the sum used in the weight formula. Does the sum
>extend also over (log(N/n(k)))^2?


yes


>thank you,
>Monika


Rao