[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
question on Policy vs. plan
At 05:42 PM 1/23/2003 -0700, Dan Bryce wrote:
Hi,
I have a question from class about when you said that a policy is a line
plan.
I said policy is a "generalization" of a line plan (
Specifically a policy is a function from states to actions; a line plan is
a "partial" function--ie. we only have
actions for some of the states--not all).
Its easy to see that a line plan is the deterministic manifestation of a
policy, but what about sensing and non-deterministic effects?
Actually a line plan is just a partial policy (see above). Deterministic
vs. non-deterministic distinction doesn't matter.
What does matter is whether you have full or partial observability. In the
case of full observability, a policy is a mapping (function) from states to
actions.
In the case of partial observability, however, we don't really know what
state we are in--we can only observe some aspects of our state. Here, the
policy is a mapping from observations to actions. (and a full policy will
tell you what action to do for every possible set of observations).
Now clearly, this latter is a general case of policy in the
fully-observable case. Specifically, in the fully observable case, the
observation you get will be the identity of the state you are in..
Such uncertainty reducing/inducing actions may require a policy that
manifests as a plan with branches and loops.
It requires a policy that is a mapping from observations to actions. That
is all.
Is this a valid distinction? In class, I don't think you qualified your
statement with "(non)deterministic".
We will discuss MDPs and policies at length latter. You can check out the
initial sections of the following paper for a lot of elaborate discussion
on this
http://www.cs.washington.edu/research/jair/abstracts/boutilier99a.html
Rao
Rao
Dan