CSE 591 Spring 2000
Final Exam
Answer all the questions.

The exam is Take-home, open-everything (no discussions with other
humans except me). 

********The exam is due in my office on THURSDAY, 11TH MAY. 


Short answer questions: 

1. We discussed that  MDP framework subsumes deterministic classical planning. It is
   also known that the problem of finding optimal policy for an MDP
   can be posed as a linear programming problem, and is thus of
   polynomial complexity in the size of MDP. Explain the apparent paradox
   between this fact, and the other fact that classical planning is
   P-space complete, and even finding whether a k-length plan exists
   for the problem is NP-complete. 

2. In the discussion on computing minimal networks for disjunctive
   TCSPs, Dechter and Schwalb point out that enforcing path
   consistency on disjunctive TCSPs may wind up leading to networks
   with exponentialy larger number of labels. Why is this possible
   given that the whole point of enforcing path consistency is to
   _reduce_ (tighten) the sizes of the various constraints


3.  List five ideas/techniques that you learned in this course
  that you thought were most interesting.

4.  List five ideas/technqiues that you thought got over-play
  and were not worth the time we spent on them.

5. List five "non-obvious" (to you) details or cross-connections that _you_ were able to
  appreciate over the semester


6. Consider the following approach for solving classical planning
   problems. Set them up as MDPs, with the Reward function being M (M >>1) in
   states where all goals are satisfied and 0 everywhere else. The
   cost of doing any action in any state is taken to be d (d <<
   M). The goal states are modeled as sink states (i.e., any action
   done from that state returns the agent to that state). Compute the
   optimal infinite horizon policy for the MDP. Use this as the basis
   for deriving the plan for the problem.

   6.1. Exactly how do you derive the plan for the problem given its
   optimal policy

   6.2. Compared to something like Graphplan approach, what are the
   advantages/disadvantages for this way of solving a classical planning
   problem 

7. Consider two intervals I and J, and their end points i-s, i-e; j-s,
   j-e.
      7.1. Represent the constraint i-s <= j-e in terms of Interval
      Algebra constraint on I and J. 

      7.2. Represent the constraint J overlaps I in terms of the point
      algebra constraints on i-s, i-e, j-s, j-e. 

8. Consider the idea of interleaving planning and execution in two
   different scenarios:
     1. Actions can have both causal and observational effects
     2. Actions can only have observational effects

   Comment on how these two scenarios affect the treatment of:

     a. maintenance and use of LCW statements
     b. handling of unexecuted action flaws 
     

***************************

PROBLEM II: (Handling incomplete information)

 Consider the following problem. We want to achieve the
goals g1 and g2.  We have the following operators (with preconditions
shown on left and effects on right).

   (p) O11 (g2)
   (~p) O12 (g2)
   (r) O21 (g1, ~w)
   (w) O22 (g1, ~r)
   ()   O3 ~g1 (observe p !v1)
   ()   O4 (observe k !v2)
   
   O3 is a sensing action which tells us whether p is true or false in 
   the world. O4 does the same for k. 
   
   We know partial information about the initial state. Specifically, we 
   know that the proposition r and w are true, that g1 and g2 are not true. But 
   We don't know whether p and k are true in the initial state.
   
   Part a: Show how the problem can be solved using the conditional 
   planning techniques discussed in the class. Make sure to point out
   if  and where the specialized linking and threat resolution techniques for 
   conditional planning were used in this example. 

  
   Part b: Consider solving this problem using the "interleaving
   planning and Execution" approach. Assume that we consider
   unexecuted actions as a type of flaws that can be executed as soon
   as the _rules_ for their sane execution are satisfied. Any
   unexecuted action flaws that are ready for resolution are
   "resolved" before the open condition flaws are resolved.  Let us
   assume that the open condition flaws are handled in the
   alphabetical order.

   Show how a UCPOP-like planner using this strategy will solve the
   problem above. ***you can assume that in the world that the agent
   is inhabiting, O3, _when executed_ will tell us that p is
   false. Similarly, O4, when executed will tell us that k is true.
  (Please note that in doing this trace you will
   be showing how the partial plan changes after execution etc. You
   will only need to show the single search path leading to
   solution). 

   Part c: Comment on the whole idea of considering unexecutable
   action as a flaw. In what ways is this flaw different from the
   normal open condition/unsafe link style flaws. 


*************

PROBLEM III. (Problem on TCSP/Resources and Causal planning.)

Consider the following problem. We have three actions

Action A1 typically lasts for 5 time units, requires the use of 3
units of resource R and gives P at the end of the execution. 

A1 can start between 3 to 5 units after the beginning of the plan. 

Action A2 takes 7 units time, requires 4 units of resource R and gives 
Q at the end of execution.

A2 can start either between 1 and 5 units after the beginning or it
can start anytime after 15. 

Action A3 takes 2 units time, requires 2 units of resource R and gives 
R at the end of execution. 

A3 can start at anytime. 

All the actions release their resources at the end of their execution. 

We have a total of 6 units of the resource R. 

We need to achieve P, Q and R.

Suppose we are doing causal link planning. We construct the plan in
the following steps

step 1. Introduce A1 to support P

step 2. Introduce A3 to support R

step 3. Introduce A2 to support Q


Suppose we are using TCSPs to represent the temporal constraints on
the plan, and the PIGs (as in the IxTET paper) to represent the
resource conflict status. Assume that the TCSPs use the point based
representation (sort of like the examples we studied in
Dechter/Schwalb paper). 

qn 1. show the status of the TCSP and PIG at the end of each of the
steps. Does the TCSP at the end of step 3 correspond to a STP (simple
temporal problem)? 

qn 2. Using the PIG at the end of step 3, find out what (if any) are
the resource conflicts, and what are the _minimal_ resource conflicts.


qn 3. For each of the minimal conflicts, compute the disjunctive
resolver for the conflict (which uses temporal constraints).  Split
the disjunction into the search space--i.e., in each branch pick one
disjunct per conflict, and add it on to the TCSP network you have at
the end of step 3.

qn 4. Pick _one_ of the TCSP networks resulting in qn 3.  Do loose
path consistency on the TCSP you got at the end of qn 3. Show the
resulting TCSP. ((You can use my lisp program to compute the LPC, as
long as you show on paper how the LPC is used to simplify at least
_one_ of the disjunctive constraints).

qn 5. Based on the answer to 4, give a feasible schedule of execution
of the plan for getting P,Q,R (you need to give exact start times of
all three actions in the plan). 

Qn 6. In Qn 3, instead of splitting the disjunction, we could have
tried to add the disjunctive resolver directly on to the set of
temporal constraints in the current TCSP. Can we do this given the
current TCSP representation? If so, how? IF not, can you think of some
way of doing it? 

++++++++++++++++++++++++++++++++++++++

PROBLEM IV (MDP stuff). 

Consider the following problem, where states are defined by three
boolean state variables: P,Q and R (thus there are 2^3 = 8 states)

We have the following actions (described in english)

Ar 
  The only effect of Ar is that  R will become true with probability .9
  if it is not already true previously.

Ap  The only effect of Ap is that 
    P will become true with probability .8 if R is false in the previous
    state. If P was true, it will remain true. 
   

Aq  The only effect of Aq is that
    Q will become true with probability .8 if P was true in the
    previous state. If Q was true previously, it will remain true.
 

(a state variable's value remains same as its value in the previous
state, unless the action description above explicitly mentions how it
changes. Thus, for example, P and Q will continue to maintain their
values when Ar is done). 

REWARD STRUCTURE:
Every state which has R true has an immediate reward of 5. Every state
which has Q true has an immediate reward of 20. If both R and Q are
present, we get the cumulative reward of 25. 

Suppose the agent gets to execute exactly three actions (and obviously
wants to get the maximum possible reward). 


Qn 1. For the action Aq, Give the 2-TBN representation, the PSO
      (probabilistic strips operator) representation and the
      extensional state representation. (You can give the extensional
      state representation in a matrix form).

      Comment on which of the representations were more compact for
      this particular action 

Qn 2. Suppose we are trying to find the optimal 3-horizon policy for
this agent by enumerating all the policies and finding the one with
the maximum value.  What is the total number of possible policies we will
have to evaluate?

**Qn 3. Use the value iteration approach to compute the optimal
 3-horizon policy for the problem. (Note that this involves just
 computing a sequence of three value functions and the associated
 policies). Compare the computational complexity of the value
 iteration based optimal policy construction to the brute-force method
 mentioned in Qn 2. If you wound up saving time, what was the
 structure that was exploited in effecting these savings?

Qn 4. Suppose the agent starts out in the state where P, Q and R are
all false to begin with. What does the optimal policy in Qn 3 tell you
as to what action the agent should do if it has 1 more action to go, 2
more actions to go and 3 more actions to go respectively. Interpret
the answer give a intutive justification for it. 

Qn 5. Consider the action sequence--[Ap;Aq;Ar] executed blindly from
the initial state where P,Q and R are all false. What is the expected
value of this sequence of actions?  (hint: if you are confused, look
at Figure 12 in Boutilier paper). Is this value equal to the value of
the state ~P,~Q,~R that you computed in Qn 3? If not, can you explain
the discrepancy?
[[You don't have to read this:: One way of seeing the point behind question 5 is as follows:
If I find the  deterministic approximation to the current problem, and
solve it using classical techniques and find a sequential plan, and
execute it blindly, I will get the value I am getting in Qn 5. The way
to approximate the actions is to assume that the high probability outcome is
the only outcome (for example, Ar will _always_ give R (not just 90%
of the time). Approximate reward function in terms of the conjunctive
goal R & Q.  We can now solve this problem given the init state ~P,~Q
and ~R. It is clear that [Ap;Aq;Ar] will be a solution for this
problem. ]]