[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

More paradoxes: Generative vs. Discriminative models--the fuller story

To: Rao Kambhampati <rao@asu.edu>
Subject: More paradoxes: Generative vs. Discriminative models--the fuller story
From: Subbarao Kambhampati <rao@asu.edu>
Date: Thu, 6 Dec 2012 09:14:57 -0700
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:from:date:x-google-sender-auth:message-id :subject:to:content-type; bh=EVSUx2BYFzyvrLITq8I+oMa5GfdlHgfNrdFHjpkdJwg=; b=VAUu2ImgLdJnC56QByxBET2WYRoXpj66exqa4ECuA6YPGXsjePkc27ScKxQoVAvIIs 20770MPWjboZ5oLkhMyHwb9pYKvMOE3gqmdVAEGc6HK2K8hiE5XCkzMhLiWNwETFgFYC LKgMSXI53tFE4pAf5jV0oyvSp+a4+KuTZg1bNbMY8UVXethgdq3KWQ4TeHlUFr9SmaqP qnhuBqwEcV5JbOVV32hitJpb2LQGziDjdoAo9PcJuD61Z6unTYQjp95OTdER3hrdWQBJ yj7yqAX4IlOqMrvuoTki7wcpSE6bc2zsSMLbIgEuwghl2M8H2uPBnLhDPK1qb0RuvelV 2mPg==
Sender: subbarao2z2@gmail.com

So we talked a bit about generative vs. discriminative models. To recap,

if you are considering modeling a bunch of variables X1...Xn, Y1..Yk, then the most you will ever need about them,

from a statistical perspective is the joint on all variables

P(X1...Xn,Y1..Yk)

This is what generative models do

However, if you happen to know that the only reason you are modeling the data is to guess/predict just the output values Yi, and you consider Xi as input and

don't ever expect to guess the input, then you really only need to model the conditional

P(Y1..Yk | X1...Xn)

This is what discriminative models do.

Note that generative models seem more general; after all

P(X1...X,Y1..Yk) = P(Y1..Yk|X1...Xn) * P(X1..Xn)

In other words, if you start with a discriminative model, and on top of it also model the joint among the input variables P(X1..Xn), then you get the full joint over everything.

Discriminative models are making their lives easy by not bothering to model P(X1..Xn)

---

Discriminative models do fine as long as they are only asked to predict the Y variables based on X variables. They can't be expected to tell a complete story of all the variables however, and in particular can never predict X based on Y. Generative models can tell the complete story on all the variables and thus can predict any variables given any other..

This makes it sound as if discriminative models are the poorer cousin to generative one. However, in fact discriminative models (such as logistic regression, conditional random fields etc) are often much more useful in practice than generative models. This is paradoxical on the face of it.

After all, how can a complete story be less useful than a partial one?

Well, this is where we need to think about modeling assumptions. As we said, all models make assumptions to make them tractable (in terms of inference and learning). Given a specific tractability horizon, you may be able to sneak in discriminative models with fewer assumptions than generative models (which, because they have a bigger job--of telling the full story--have to make more assumptions--to stay under the tractability horizon).

In statistical modeling, the *assumptions* are always about independencies.

Naive Bayes is a generative model, which in its zeal to make a full story, assumes that all attributes (words in the case of text) are independent given the category (topic for text). So it does a bad job when the words are not independent.

Logistic regression is the corresponding discriminative model which ignores modeling the the attribute (word) dependencies--so it allows for them to have more correlations (it doesnt say anything about them, so they can have as many correlations as they want).

Because of this, when all you want is prediction of the topic (the output variables), the generative models do worse than discriminative models given a specific tractability horizon.

(this last part is critical--although not crisply defined.. you can get generative models that do better job of classification than logistic regression--but NBC--which is the approximately corresponding generative model--doesn't measure up).

So, in ML courses, which tend to focus more on classification tasks--which involve predicting output variables--you spend a lot more time on discriminative models.

The interesting thing is that tractability horizon is a moving target--as as we move it, we can come up with better and better generative models..

======

The paradox above is that we said two different things:

1. that generative models are more flexible as they tell fuller stories

2. that discriminative models are more effective than generative models

These two statements seem contradictory on the face of it. However I hope you realize from my explanation that they are both right in the correct interpretation. A cliffs notes version is:

Generative models tell full stories--but they make the stories simplistic to make them learnable; discriminative models tell half stories, but they allow for a lot of rich dependencies in the unmodeled inputs..

Rao

ps: At the risk of straying far from statistics, I am tempted to draw an analogy with story telling in human cultures. Every culture has creation stories--and they are "satisfying" because they are complete (think "generative"). However, to keep the story comprehensible, they simplify a whole lot ("our god made the universe in 7 days" "oh really? Ours only needed six days, and then took rest on the seventh" ). They do try to be explanatory "Lord rama then thanked the squirrel by sweeping his fingers lovingly over its fur, thus giving it the lines on its back..", but nevertheless they are quite bad in their predictive power.

To allow for improved prediction, we have learned to live with partial stories--I am going to get by with models that predict when smoking causes cancer, even when I don't have a full story of the process of how smoking causes it.

In taking this utilitarian stance, we open ourselves to the criticism--from scholars like Joseph Campbell--that modern civilizations are not wholesome because they don't tell stories. There are many new-age movements that actually actively encourage story telling (generative models).

The thing is, while we are willing to get by with discriminative models, we, as a civilization, are also continually improving our abilities to tell the fuller stories. We do know a lot more about how smoking actually does cause cancer. Only, the stories are not "simplistic enough" and cannot quite be written down in under 10 pages.

So you often have the highly tempting (but ultimately retrograde) movement that argues in favor of simplistic stories because of their perceived beauty..

======

..before you go, I'd like to ask you something.

Yes?

The Tsimtsum sank on July 2nd, 1977

Yes

And I arrived on the coast of Mexico, the sole human survivor of Tsimtsum, on February 14th, 1978

That is right

I told you two stories that account for the 227 days in between

Yes you did

Neither explains the sinking of the Tsimtsum

That is right

Neither makes a factual difference to you

That is true

You can't prove which story is true and which is not. You must take my word for it.

I guess so

In both stories, the ship sinks, my entire family dies, and I suffer

yes, that is true

So tell me, since it makes no factual difference to you and you can't prove the question either way, which story do you prefer?

Which is the better story, the story with animals or the story without animals?

--Life of Pi; Yann Martel

Follow-Ups:
- More paradoxes: Generative vs. Discriminative models--the fuller story
  - From: Subbarao Kambhampati <rao@asu.edu>

Prev by Date: More paradoxes: Generative vs. Discriminative models--the fuller story
Next by Date: Mandatory Interactive Review Forum Post (Deadline: Monday before class)[+ extension on LDA assignment]
Previous by thread: Re: A worked out LDA example
Next by thread: More paradoxes: Generative vs. Discriminative models--the fuller story
Index(es):
- Date
- Thread