[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: A worked out LDA example



another interesting thing about the marginalization is that there is no simple plate model for the marginalized distribution (where are you going to put the clumped up Z's ?)

So look at these semi-paradoxes. If you can (a) see them as parodoxical first and (b) rationalize them, then you know you understand this area ;-)

1. We start with a plate model, and we unroll it, and marginalize some variables off, and then we don't know how to go back to the plate model
  [metaphor: you start with a nice block-structured program, convert it into assembly language, do some nice compiler optimzations; and now the optimized assembly code may not have any nice block-structured analog ;-)]

2. We introduced the \theta (in Kevin Murphy's notation, \pi) variables to make the network sparse (i.e., fewer parameters to be learned--each q variable has only one parent); and then to bayes-learn these parameters, we decide to marginalize the \pi variables, making q variables all tangled up (now each q variable has all other q variables as its "neighbors" --I say neighbor rather than parent since we now have undirected links)


Rao

On Thu, Dec 6, 2012 at 7:19 AM, Subbarao Kambhampati <rao@asu.edu> wrote:
Folks

 I am attaching the scan of four pages from Kevin Murphy's book on Machine Learning. The last page has a small worked out example of LDA parameter setting.

Here is how to read the file:

First page:  This is there just to let you see the LDA plate model (as Kevin uses yet another notation to describe it--with \pi used in place of \theta, and q is used in place of z)

second page: shows the unrolled LDA model *before* and *after* the marginalizing of \pi. You will notice, as I mentioned in my mail yesterday, that when \pi  variables are integrated out, q variables get correlated. (What is interesting here, if you look closely, is that the connections between the q variables are direction-less. That makes that part of the network a markov network (which we haven't discussed in the class) --rather than a bayes network. The whole network becomes a mixed graphical model after marginalization. This is part of the complexity under the hood. You note that Hinrich paper derives the gibbs sampling update rules directly from the join over the q variables, without bothering to see the graphical model.)

third page--the collapsed gibbs sampler is derived (with many more missing steps than in Hinrichs)

end of third and beginning of fourth page-- a short worked out example 


regards
Rao