[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

question on Policy vs. plan



At 05:42 PM 1/23/2003 -0700, Dan Bryce wrote:

Hi,

I have a question from class about when you said that a policy is a line
plan.
I said policy is a "generalization" of a line plan (
Specifically a policy is a function from states to actions; a line plan is a "partial" function--ie. we only have
actions for some of the states--not all).

Its easy to see that a line plan is the deterministic manifestation of a
policy, but what about sensing and non-deterministic effects?
Actually a line plan is just a partial policy (see above). Deterministic vs. non-deterministic distinction doesn't matter.

What does matter is whether you have full or partial observability. In the case of full observability, a policy is a mapping (function) from states to actions.

In the case of partial observability, however, we don't really know what state we are in--we can only observe some aspects of our state. Here, the policy is a mapping from observations to actions. (and a full policy will tell you what action to do for every possible set of observations).

Now clearly, this latter is a general case of policy in the fully-observable case. Specifically, in the fully observable case, the observation you get will be the identity of the state you are in..



Such uncertainty reducing/inducing actions may require a policy that
manifests as a plan with branches and loops.
It requires a policy that is a mapping from observations to actions. That is all.


Is this a valid distinction?  In class, I don't think you qualified your
statement with "(non)deterministic".
We will discuss MDPs and policies at length latter. You can check out the initial sections of the following paper for a lot of elaborate discussion on this

http://www.cs.washington.edu/research/jair/abstracts/boutilier99a.html

Rao



Rao



Dan