[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Homework 6--Comments



[[This is also posted on the homework page]]


TCN
--Most people got the Simple Temporal Network problem right.

PCP
--> Most people also got the PCP scheduling problem right (in this
--> case, the Arc-B consistency itself posts enough orderings between
--> the jobs to complete the scheduling

Conformant
--> In conformant planning, some people  went with planning in formula 
--> space. It would be easier in this simple problem to just enumerate 
--> the states. One person assumed that since the status of R is not
--> known, it should be false--such closed world assumption does not
--> make sense when we have incomplete init state.


Conditional
-->The belief space depends only on the number of literals and has no
-->connection to the number of actions (as most people figured
-->out). Several people gave solutions involving redundant sensing for 
-->the conditional planning problem. All you need to do is sense if Q
-->is true. When Q is not true, there is a conformant plan that will
-->make it ture. If it is true, you are done anyway (of course
-->any legal solution got full points)


MDP
-->The MDP problem was tricky for most people. Here is the full
-->answer:

  A. If you have optimal policy, then just starting from the init
state and in each state, (*simulate*) the action that is suggested by the
optimal policy, until you reach goal state, and remembering the
sequence of actions done will give the plan. 

  B. The full policy is a bit of an over-kill since we wind up finding 
the shortest path from every state to the goal state, and all we
really wanted was finding the shortest path from init state to the
goal state. Of course, if you have the full policy, and you do wind up 
encountering execution failures and find yourselves in unexpected
states, you can just pick up from there (without replanning). [But
this "advantage" doesn't really outweigh the disadvantage of full
policy

 C. When you start with value function initialized to immediate
rewards and and run the value iteration, you are essentially getting
better and better bounds on the distance of arbitrary states from the
goal state. So, this can be used as an admissible heuristic for the
search. 

 This is just the tip of a deeper connection between Dynamic
Programming (which attempts to compute shortest path for all states),
and A* search--which only really wants to compute the shortest path
for those nodes that fall on the optimal solution path. 

  It is interesting to note that although we only need optimal policy
for the states that fall on the deterministic shortest path from init
to goal state, the "Heuristic" itself has to be available for more
states than those that fall just on the solution path (since it is the 
heuristic that tells you that those other states are not worth
exploring). 

  Thus often times, extraction of heuristics for an A* search involves 
dynamic programming. Indeed, the planning graph based heuristics can
be seen as essentially doing an approximate computation of the value
function. 

  See http://citeseer.nj.nec.com/haslum00admissible.html 
for a discussion of this particular view of planning graph-based
heuristics (and also see Section 6.3 of AltAlt journal paper for an
argument that doing explicit dynamic programming is not as good as
using planning graph to get the same effect). 


Rao
---------- 
Subbarao Kambhampati 
Professor, Dept. of Comp Sci. and Engg. Arizona State University, 
Tempe, AZ 85287-5406  rao@asu.edu (email)
480-965-0113 (Phone)                         
480-965-2751 (FAX) 
WWW: http://rakaposhi.eas.asu.edu 
Check out http://rakaposhi.eas.asu.edu/bibfinder