[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

A new topic posted to the forum--feel free to comment there..



Policies vs. plans & Non-markovian systems: Some discussion topics based on today's class

Categories > News (public)
Show Topic Admin Actions
Page 1
1–1
novice - founder
7 posts
0 green thumb up red thumb downpermalink
Post Admin: edit topic | delete topic | spam
Edit Post - As an admin you may edit your own posts anytime via 'Post Admin: edit' above

Feel free to comment on the following

1. Suppose it turns out that Californian authorities are trying to keep unwanted folks out. They institute a "token" system. To go through Redding (which is the CA border town on I-10 on the way to LA), they will only let people carrying the tokens to pass through. The tokens are only given in Tucson. 
Does this problem now have optimal substructure? (That is, is segment of the shortest path to LA that ends in Redding the shortest path to Redding?).  Can you see that this loss of optimal substructure is related to loss of markovian assumption?

1.a. Suppose rather than think of the "state" to be the geographical point you are in, you define the state to be <The geographical point, Is-the-token-present?>. Can you see that in this new space, the markovian assumption holds (and so does the optimal sub-structure?)

1.b. Does the above tell you a general way of converting a non-markovian system into a markovian one? Are there any computational tradeoffs?


2. A "policy" is a (total) function from states to actions.  Can we think of a sequential plan as a "partial policy" in that it is a partial function from states to actions? Suppose we define the coverage of a partial policy as the set of states for which it tells us actions to do. Does this view give us a way of starting from a sequential plan and slowly increasing its coverage?

2.1 In the above you thought of sequential plan as a partial policy on a fully observable MDP. Can you also think of a sequential plan as a "blind" policy in that it does action based only on the time-to-go and ignores the state?  Specifically,  Policy(S,T) = Policy(T). 


Rao