[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: scalar clustering and co-currence

To: elvis@inficad.com
Subject: Re: scalar clustering and co-currence
From: Subbarao Kambhampati <rao@asu.edu>
Date: Fri, 30 Mar 2001 07:17:06 -0700
Cc: rao@asu.edu, cse494-s01@asu.edu
In-reply-to: "Your message of Thu, 29 Mar 2001 23:24:43 -0700"<000c01c0b8e2$1c4b7200$b496dcd0@hq.inficad.com>
References: <000c01c0b8e2$1c4b7200$b496dcd0@hq.inficad.com>


elvis> I'm wondering about the equation
elvis> 
elvis> n*(t1/n*t2/n) = m  or close to m...
elvis> 
elvis> I see THAT it works but I don't see exactly WHY.
elvis> what is this relationship t1/n*t2/n ?
elvis> To clarify in english (sorry but it's my main lang. ; ) )
elvis> why is it that the percentage of docs that contain t1 * the percentage that
elvis> contain t2 is approx equal to the percentage that contain both??? The
elvis> product confuses me..... grrr
elvis> 

elvis> 
elvis> 

think in terms of probabilities. 
Let P(T1) be the probability that a random document contains t1.
Clearly P(T1) = t1/n (if we have a sufficiently large set of documents 
n)

Similarly P(T2) = t2/n

We want to compute the probability 

P(T1 & T2)

If T1 and T2 are independent--that is they appear independently of
each other and have no correlation, then

P(T1 & T2) = P(T1) * P(T2)
           = t1/n * t2/n

Thus, when T1 and T2 are independent, then the fraction of docs
containing both is just the product of the fractions of docs
containing either. 

If they are not independent, the we know that the more general
rule is needed

P(T1 & T2) = P(T1|T2) * P(T2)   = P(T2|T1) * P(T1)

This general rule take the correlations into account.

Specfically, P(T1|T2) = P(T1) if T1 is independent of T2

                      < P(T1) if appearance of T2 reduces the
                               probability of appearance of T1
                               (negatively correlated)

                     > P(T1) if appearance of T2 increases the
                              probability of of appearance of T1

Rao

Prev by Date: Project1 - Speeding up Vector Space Calculations
Next by Date: project 1 late submission
Prev by thread: Trim Fix... (fwd)
Next by thread: project 1 late submission
Index(es):
- Date
- Thread