[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: precision/recall



From: Wes Dyer <wesdyer@asu.edu>
Subject: precision/recall
Date: Tue, 27 Jan 2004 21:10:04 -0700
Message-ID: <DAVIDVDBcs7sypzg7Ka00000eb4@petroleumdata.com>

wesdyer> I have a few questions about calculating precision and recall from a sample
wesdyer> query.  Given the definitions of precision and recall, it seems that a
wesdyer> sample query might well have multiple precision levels for a certain recall
wesdyer> level.  For example, say if there were five relevant documents about a given
wesdyer> subject and a query returned five results but only the 1st and 3rd were
wesdyer> relevant then we have the following:
wesdyer>  
wesdyer>             D1*, D2, D3*, D4, D5
wesdyer>             * indicates a relevant document
wesdyer>  
wesdyer> >From this sample query we can say that when recall is 1/5 then precision is

reasonable doubt..

Note that to find out what the recall is you need to know how many
total relevant answers were there (unlike precision, recall is a
"global" property--in that you cannot tell what the recall is unless
you know how many answers are _supposed_ to be there for the
query--you need to be an oracle!). 

So, in the above, there is no way you will know recall is 1/5. Suppose
you happen to know that there are a total of 7 relevant documents
(of which two are shown in the top 5 results d1--d5, then the recall
for the top 5 is 2/7.) [In the homework question, you will notice that 
I told you how many relevant documents are there for the query]

wesdyer> 1 or 1/2 (looking at the first document or the first two documents
wesdyer> respectively).  Also when recall is 2/5 then precision is 2/3, 1/2, or 2/5.

You should be looking at the recall for the same set as you are
considering for precision. Assuming 7 is the total number of relevant
documents for the query, then

if you look at first document alone

 precision is 1 and recall is 1/7

after D1,D2
 precision is 1/2   and recall is 1/7 

after D1,D2,D3

precision is 2/3   and recall is 2/7

[[What I mean by precision being a local property is that if I give
10 documents to the user and he/she says 4 are relevant, I readily
know what my precision is; I don't know what my recall is unless I
continue giving all the documents and making the user tell me which of 
the whole lot are relevant. So, recall is a global property.

Continuing our analogy between soundness/completeness and
precision/recall,  soundness is a local property for databases. If a
database returns a set of answers for a query, you can easily tell, by
looking at the results, whether the database manager is sound (i.e.,
if all of the results returned are actual results for the
query). Completeness--i.e., whether or not the dbms sent all the
answers--is harder to verify]

Rao



wesdyer>  
wesdyer> Now the questions.
wesdyer> 1.	Am I looking at this in the right way?
wesdyer> 2.	To find precision for a given recall should I average the precisions
wesdyer> from the sample query?
wesdyer>  
wesdyer> Thanks for the help.
wesdyer>  
wesdyer> Wes Dyer
wesdyer> wesdyer@asu.edu
wesdyer>