[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: precision/recall
From: Wes Dyer <wesdyer@asu.edu>
Subject: precision/recall
Date: Tue, 27 Jan 2004 21:10:04 -0700
Message-ID: <DAVIDVDBcs7sypzg7Ka00000eb4@petroleumdata.com>
wesdyer> I have a few questions about calculating precision and recall from a sample
wesdyer> query. Given the definitions of precision and recall, it seems that a
wesdyer> sample query might well have multiple precision levels for a certain recall
wesdyer> level. For example, say if there were five relevant documents about a given
wesdyer> subject and a query returned five results but only the 1st and 3rd were
wesdyer> relevant then we have the following:
wesdyer>
wesdyer> D1*, D2, D3*, D4, D5
wesdyer> * indicates a relevant document
wesdyer>
wesdyer> >From this sample query we can say that when recall is 1/5 then precision is
reasonable doubt..
Note that to find out what the recall is you need to know how many
total relevant answers were there (unlike precision, recall is a
"global" property--in that you cannot tell what the recall is unless
you know how many answers are _supposed_ to be there for the
query--you need to be an oracle!).
So, in the above, there is no way you will know recall is 1/5. Suppose
you happen to know that there are a total of 7 relevant documents
(of which two are shown in the top 5 results d1--d5, then the recall
for the top 5 is 2/7.) [In the homework question, you will notice that
I told you how many relevant documents are there for the query]
wesdyer> 1 or 1/2 (looking at the first document or the first two documents
wesdyer> respectively). Also when recall is 2/5 then precision is 2/3, 1/2, or 2/5.
You should be looking at the recall for the same set as you are
considering for precision. Assuming 7 is the total number of relevant
documents for the query, then
if you look at first document alone
precision is 1 and recall is 1/7
after D1,D2
precision is 1/2 and recall is 1/7
after D1,D2,D3
precision is 2/3 and recall is 2/7
[[What I mean by precision being a local property is that if I give
10 documents to the user and he/she says 4 are relevant, I readily
know what my precision is; I don't know what my recall is unless I
continue giving all the documents and making the user tell me which of
the whole lot are relevant. So, recall is a global property.
Continuing our analogy between soundness/completeness and
precision/recall, soundness is a local property for databases. If a
database returns a set of answers for a query, you can easily tell, by
looking at the results, whether the database manager is sound (i.e.,
if all of the results returned are actual results for the
query). Completeness--i.e., whether or not the dbms sent all the
answers--is harder to verify]
Rao
wesdyer>
wesdyer> Now the questions.
wesdyer> 1. Am I looking at this in the right way?
wesdyer> 2. To find precision for a given recall should I average the precisions
wesdyer> from the sample query?
wesdyer>
wesdyer> Thanks for the help.
wesdyer>
wesdyer> Wes Dyer
wesdyer> wesdyer@asu.edu
wesdyer>