CSE 591 Spring 2000 Final Exam Answer all the questions. The exam is Take-home, open-everything (no discussions with other humans except me). ********The exam is due in my office on THURSDAY, 11TH MAY. Short answer questions: 1. We discussed that MDP framework subsumes deterministic classical planning. It is also known that the problem of finding optimal policy for an MDP can be posed as a linear programming problem, and is thus of polynomial complexity in the size of MDP. Explain the apparent paradox between this fact, and the other fact that classical planning is P-space complete, and even finding whether a k-length plan exists for the problem is NP-complete. 2. In the discussion on computing minimal networks for disjunctive TCSPs, Dechter and Schwalb point out that enforcing path consistency on disjunctive TCSPs may wind up leading to networks with exponentialy larger number of labels. Why is this possible given that the whole point of enforcing path consistency is to _reduce_ (tighten) the sizes of the various constraints 3. List five ideas/techniques that you learned in this course that you thought were most interesting. 4. List five ideas/technqiues that you thought got over-play and were not worth the time we spent on them. 5. List five "non-obvious" (to you) details or cross-connections that _you_ were able to appreciate over the semester 6. Consider the following approach for solving classical planning problems. Set them up as MDPs, with the Reward function being M (M >>1) in states where all goals are satisfied and 0 everywhere else. The cost of doing any action in any state is taken to be d (d << M). The goal states are modeled as sink states (i.e., any action done from that state returns the agent to that state). Compute the optimal infinite horizon policy for the MDP. Use this as the basis for deriving the plan for the problem. 6.1. Exactly how do you derive the plan for the problem given its optimal policy 6.2. Compared to something like Graphplan approach, what are the advantages/disadvantages for this way of solving a classical planning problem 7. Consider two intervals I and J, and their end points i-s, i-e; j-s, j-e. 7.1. Represent the constraint i-s <= j-e in terms of Interval Algebra constraint on I and J. 7.2. Represent the constraint J overlaps I in terms of the point algebra constraints on i-s, i-e, j-s, j-e. 8. Consider the idea of interleaving planning and execution in two different scenarios: 1. Actions can have both causal and observational effects 2. Actions can only have observational effects Comment on how these two scenarios affect the treatment of: a. maintenance and use of LCW statements b. handling of unexecuted action flaws *************************** PROBLEM II: (Handling incomplete information) Consider the following problem. We want to achieve the goals g1 and g2. We have the following operators (with preconditions shown on left and effects on right). (p) O11 (g2) (~p) O12 (g2) (r) O21 (g1, ~w) (w) O22 (g1, ~r) () O3 ~g1 (observe p !v1) () O4 (observe k !v2) O3 is a sensing action which tells us whether p is true or false in the world. O4 does the same for k. We know partial information about the initial state. Specifically, we know that the proposition r and w are true, that g1 and g2 are not true. But We don't know whether p and k are true in the initial state. Part a: Show how the problem can be solved using the conditional planning techniques discussed in the class. Make sure to point out if and where the specialized linking and threat resolution techniques for conditional planning were used in this example. Part b: Consider solving this problem using the "interleaving planning and Execution" approach. Assume that we consider unexecuted actions as a type of flaws that can be executed as soon as the _rules_ for their sane execution are satisfied. Any unexecuted action flaws that are ready for resolution are "resolved" before the open condition flaws are resolved. Let us assume that the open condition flaws are handled in the alphabetical order. Show how a UCPOP-like planner using this strategy will solve the problem above. ***you can assume that in the world that the agent is inhabiting, O3, _when executed_ will tell us that p is false. Similarly, O4, when executed will tell us that k is true. (Please note that in doing this trace you will be showing how the partial plan changes after execution etc. You will only need to show the single search path leading to solution). Part c: Comment on the whole idea of considering unexecutable action as a flaw. In what ways is this flaw different from the normal open condition/unsafe link style flaws. ************* PROBLEM III. (Problem on TCSP/Resources and Causal planning.) Consider the following problem. We have three actions Action A1 typically lasts for 5 time units, requires the use of 3 units of resource R and gives P at the end of the execution. A1 can start between 3 to 5 units after the beginning of the plan. Action A2 takes 7 units time, requires 4 units of resource R and gives Q at the end of execution. A2 can start either between 1 and 5 units after the beginning or it can start anytime after 15. Action A3 takes 2 units time, requires 2 units of resource R and gives R at the end of execution. A3 can start at anytime. All the actions release their resources at the end of their execution. We have a total of 6 units of the resource R. We need to achieve P, Q and R. Suppose we are doing causal link planning. We construct the plan in the following steps step 1. Introduce A1 to support P step 2. Introduce A3 to support R step 3. Introduce A2 to support Q Suppose we are using TCSPs to represent the temporal constraints on the plan, and the PIGs (as in the IxTET paper) to represent the resource conflict status. Assume that the TCSPs use the point based representation (sort of like the examples we studied in Dechter/Schwalb paper). qn 1. show the status of the TCSP and PIG at the end of each of the steps. Does the TCSP at the end of step 3 correspond to a STP (simple temporal problem)? qn 2. Using the PIG at the end of step 3, find out what (if any) are the resource conflicts, and what are the _minimal_ resource conflicts. qn 3. For each of the minimal conflicts, compute the disjunctive resolver for the conflict (which uses temporal constraints). Split the disjunction into the search space--i.e., in each branch pick one disjunct per conflict, and add it on to the TCSP network you have at the end of step 3. qn 4. Pick _one_ of the TCSP networks resulting in qn 3. Do loose path consistency on the TCSP you got at the end of qn 3. Show the resulting TCSP. ((You can use my lisp program to compute the LPC, as long as you show on paper how the LPC is used to simplify at least _one_ of the disjunctive constraints). qn 5. Based on the answer to 4, give a feasible schedule of execution of the plan for getting P,Q,R (you need to give exact start times of all three actions in the plan). Qn 6. In Qn 3, instead of splitting the disjunction, we could have tried to add the disjunctive resolver directly on to the set of temporal constraints in the current TCSP. Can we do this given the current TCSP representation? If so, how? IF not, can you think of some way of doing it? ++++++++++++++++++++++++++++++++++++++ PROBLEM IV (MDP stuff). Consider the following problem, where states are defined by three boolean state variables: P,Q and R (thus there are 2^3 = 8 states) We have the following actions (described in english) Ar The only effect of Ar is that R will become true with probability .9 if it is not already true previously. Ap The only effect of Ap is that P will become true with probability .8 if R is false in the previous state. If P was true, it will remain true. Aq The only effect of Aq is that Q will become true with probability .8 if P was true in the previous state. If Q was true previously, it will remain true. (a state variable's value remains same as its value in the previous state, unless the action description above explicitly mentions how it changes. Thus, for example, P and Q will continue to maintain their values when Ar is done). REWARD STRUCTURE: Every state which has R true has an immediate reward of 5. Every state which has Q true has an immediate reward of 20. If both R and Q are present, we get the cumulative reward of 25. Suppose the agent gets to execute exactly three actions (and obviously wants to get the maximum possible reward). Qn 1. For the action Aq, Give the 2-TBN representation, the PSO (probabilistic strips operator) representation and the extensional state representation. (You can give the extensional state representation in a matrix form). Comment on which of the representations were more compact for this particular action Qn 2. Suppose we are trying to find the optimal 3-horizon policy for this agent by enumerating all the policies and finding the one with the maximum value. What is the total number of possible policies we will have to evaluate? **Qn 3. Use the value iteration approach to compute the optimal 3-horizon policy for the problem. (Note that this involves just computing a sequence of three value functions and the associated policies). Compare the computational complexity of the value iteration based optimal policy construction to the brute-force method mentioned in Qn 2. If you wound up saving time, what was the structure that was exploited in effecting these savings? Qn 4. Suppose the agent starts out in the state where P, Q and R are all false to begin with. What does the optimal policy in Qn 3 tell you as to what action the agent should do if it has 1 more action to go, 2 more actions to go and 3 more actions to go respectively. Interpret the answer give a intutive justification for it. Qn 5. Consider the action sequence--[Ap;Aq;Ar] executed blindly from the initial state where P,Q and R are all false. What is the expected value of this sequence of actions? (hint: if you are confused, look at Figure 12 in Boutilier paper). Is this value equal to the value of the state ~P,~Q,~R that you computed in Qn 3? If not, can you explain the discrepancy? [[You don't have to read this:: One way of seeing the point behind question 5 is as follows: If I find the deterministic approximation to the current problem, and solve it using classical techniques and find a sequential plan, and execute it blindly, I will get the value I am getting in Qn 5. The way to approximate the actions is to assume that the high probability outcome is the only outcome (for example, Ar will _always_ give R (not just 90% of the time). Approximate reward function in terms of the conjunctive goal R & Q. We can now solve this problem given the init state ~P,~Q and ~R. It is clear that [Ap;Aq;Ar] will be a solution for this problem. ]]