[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Vector space model task

To: cse494-f02 <cse494-f02@parichaalak.eas.asu.edu>
Subject: Vector space model task
From: Sree <slakshmi@asu.edu>
Date: Thu, 17 Oct 2002 15:12:50 -0700
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.0.0)Gecko/20020530

In the TASK 1 description, it asks you to use vector space formula from Homework or in the textbook. (Modern Information retrieval by Ricardo-Yates
Many students were asking me what teh textbook formula would look like and if it was easier to implement it than the one in Homework.
I am including the textbook formula below and it is up to you to decide which one is easier.

The textbook vector space formula goes like this:

t = Number of terms in the index
N= Number of documents in the collection
tfij= term frequency of term i in document j
ni=Number of documents term i occurs in

Weight of term i in document j (wij) = tfij*log(N/ni)

Similarity of document j to a query q Sim(Dj,q) = | Dj.q|/|Dj||q|

|Dj|=weight of the document j = SQRT(w1j^2 + w2j^2 + ........+ wtj^2)
In other words weight of a document is the squareroot of sum of squares of weights of all terms contained in the document.

In the API(in the directory doc/api/index-all.html), the following classes will be useful in implementing Vector Space ranking:

Termenum is the class representing all the terms in the index (An enumeration of terms)
Termval is the class that holds term or keyword
Termdocs is a representation for documents

Sree

Prev by Date: Re: Need help on Project 1 for CSE494
Next by Date: Reminder: Joint AI lab talk and Fall Seminar on Data and KnowledgeIntegration
Previous by thread: Re: Need help on Project 1 for CSE494
Next by thread: Sponsored questions on the exam..
Index(es):
- Date
- Thread