Probabilistic Planning
	  (Updated with the class notes of 11/27 & 11/29)

						Terry Zimmerman
						CSE 591 Planning & Learning
						Nov.  1995
Revised by Rao [Dec  5, 1995]


Classical planning assumes deterministic actions. In the "real world"
planning must generally take place in a non deterministic environment
where key information needed to make decisions may be unavailable,
incomplete, or inaccurate. In addition actions m ay have multiple possible
outcomes. If you know or can estimate the probabilities of the various
outcomes you have a basis to proceed with planning. (On the other hand if
you don't know anything about the possible outcomes of an action there's
not much po int in doing planning...)

In the way of an example, if you must remember to pickup a turkey on the
way home today in order to make a Thanksgiving dinner you might make a
mental note to yourself. However considering your deteriorating memory
lately and the importance of keeping fam ily and guests content you may
judge that a mental note alone is inadequate insurance that you'll succeed
at your goal. So you may write yourself a note and post it on your office
door as a backup to your memory. If you're already held responsible for a
t urkey-less Thanksgiving last year you may judge that even this action
may not adequately insure success so you may tie a string around your
finger as a reminder. 

Another action in your Thanksgiving plan is finding a turkey at this late
date at the grocers. Consideration of the possibility that the first
market you visit could be sold out may lead you to plan to leave a half
hour early, to improve your chances of s uccess. 

This type of planning is markedly different than the process and
environment associated with classical planning where complete information
of the world state is available and all effects of actions are known. The
term "probabilistic planning" as used in t his paper refers to any of a
number of approaches to planning intelligently in a stochastic world with
incomplete knowledge. 

The primary focus will be on the methodology presented in "An
Algorithm for Probabilistic Planning" -Kushmerick, et. al. (Ref. 1)
and "Probabilistic Planning with Information Gathering and Contingent
Execution" -Draper, et. al. (Ref. 2.). However the relationship of
this method to others explored in the literature (including Markov
decision processes) will be discussed at the end of the paper.

Ref. 1 describes the assumptions, world state and action representations
and an algorithm that models probabilistic planning in a fairly intuitive
way. The algorithm is implemented as the BURIDAN planner and it is shown
to be both sound and complete


	The Approach to Modeling Uncertainty

The two key issues in probabilistic planning are representation and
reasoning/plan synthesis. 

REPRESENTATION

In the absence of deterministic world knowledge a reasonable approach is
to represent the possible world states as a probability distribution at
any given point in time. The outcomes of actions too are not entirely
predictable and hence can be described in terms of conditional
probability distributions. This has the effect of complicating the
definition of a successful plan. The chosen approach here is to terminate
planning when a plan is generated that achieves the goal with a likelihood
exceeding a user-supplied probability threshold. 

Representing States: 

Even the initial state is uncertain and, as such, it can be represented as
a distribution over states. The sum of the probabilities for the discrete
states in the distribution must sum to 1.0 (a special case is where the
distribution has probability 1 for
 one state). Each state succeeding the initial state is therefore also represented as a probability distribution.

Example in blocks world (with blocks A,B,C):	
	     B				          A
States=>   __A___C_	__A_B_C_		__B__C_

Prob. =>     0.3          0.6                    0.1

Representing Actions:
Actions, then, can be represented as mappings from one distribution to another.
					        __________________________
   __________________				|      _|_	    _|_   |
   |    B	     |			 	|   B | C |    B   |   |  |
   |  __A___C_  0.3  |				|  _A___     __A__C_      |
   |		     |	     Action A		|   0.27     0.03         |
   | __A_B_C_  0.6   |========================> |   		          |	
   |		     | Pickup C		        |      _|_	 _|_      |
   |    A	     |  with  Prob. 0.9         |     | C |	|   |     |
   |  __B__C_    0.1 |             -holding(C)  | _A_B____   _A_B_C_      |
   ------------------	      Prob 0.1		|   0.54      0.06        |
				   -ontable(C)	|	       	          |
					        |        _|_       _|_    |
					        |       | C |	  |   |   |
					        |    A		  A       |
						|   _B______	 _B___C__ |
						|      0.09	   0.01   |
						--------------------------

Given that actions transform one distribution of states to another
distribution, asking if a given state achieves the goal doesn't make any
sense. Goal satisfaction becomes a probabilistic issue -does the
probability that the goal is achieved exceed a sp ecified level. For
example, given the final state distribution, determine how many of the
states have the goal true in them, sum the probabilities of all those
states, and determine if the sum is greater than the target goal
probability. 

Two major means of representing actions are as state mappings or in STRIPS
style.  State mappings is the usual encoding of Markov Decision Processes.
Given n state variables and m possible values for each there are m**n
possibilities. This is essentially like a big DFA. 

In representing actions in STRIPS style only the factors that change are
given and the complete state is implicitly computed. This is a more
compact representation than state mappings if each action changes only a
small portion of the state. The methodology presented here deals
primarily with STRIPS style representations. 

BURIDAN

Standard STRIPS representation allows an action to be applied only if its
preconditions are satisfied when the action is executed. BURIDAN uses a
modified STRIPS model where actions have multiple possible outcomes and
can be executed in any world state. T he actions' effects, then, depend on
the world state at the time of execution and, to some degree, random
chance. And unlike the case for classical planning, since the world at any
time is characterized as a probability distribution over possible states
(a state distribution) BURIDAN's actions cause a transition from one state
distribution to another. 

The actions may be visualized as binary trees where the leaves are the
actions' effects given the world state at the time action is executed and
the impact of random chance. For example, a 'Bring-Home-Turkey' action
might be represented as shown in Fig. 1 , if we consider only your weak
memory (and of course chance) as relevant to the outcomes. 

 ............................................................................
 FIGURE 1
			      Bring-Home-Turkey
		    	       /              \
		  Note Posted /                \    No Note Posted
		             /	                \
			    /                    \
			   / 	                  \
		   -----------	                  -----------
    c1(prob=0.8)  /           \~c1  c2(prob=o.4)/            \  ~c2
		 /             \               /              \
	Have-Turkey     (No change)    Have-Turkey        (No change)

...............................................................................

This can be written symbolically as:

 BT
  outcome 1:  Trigger: note posted   Effects: (have turkey) Prob: .8
  outcome 2:  Trigger: note posted   Effects:  nil          Prob: .2
  Outcome 3:  Trigger: no note posted Eff: have-turkey	    Prob: .4
  outcome 4:  Trigger: no note posted  Eff: nil             Prob: .6


Here we reflect the fact that if a note has been posted in your office
there's an 80% probability that you'll remember to stop at the market for
a turkey (and thus a 20% chance that you won't). If no note has been
posted there's only a 40% chance that you 'll remember to stop. 


It must be emphasized that no probabilities are being directly assigned to
action outcome propositions without considering the state of the world at
execution time. 

Note also that the actions don't have any equivalent of executability
preconditions of SNLP/UCPOP. An action will always _execute_
irrespective of the state that it is applied in. The outcomes of the
action will depend on whether certain trigger conditions hold. 

The BURIDAN Planning Process

In general a probabilistic planning algorithm must generate a sequence of
actions such that starting from an initial state distribution and
executing each action in turn results in a final state distribution in
which the goal expression holds with probabi lity greater than some
user-supplied value. 

BURIDAN searches through plan space until it finds one that achieves the
goal with sufficient probability (as defined by the user) Although, with
the action semantics and goal satisfaction semantics discussed above, ANY
planning approach (FSS,BSS,etc.) c an be extended to solve these problems.
Plan representations and planning in the BURIDAN system differs from those
of classical partial order plans in 4 major respects: 

In reading these differences, note that UNLIKE SNLP and UCPOP, and
like the general Refine-Plan algorithm, BURIDAN terminates using a
global assessment computation that is independent of the refinement
process. Thus refinement need not continue until all conditions are
supported and all threats are resolved. (Further more, unlike even
Refine-Plan, Buridan's assessment algorithm doesn't strictly care
whether the causal links it established are "holding" or not. In other
words, the refinement process doesn't exactly split the candidate sets
of the plan. I admit that it is not clear to me as to what should be
the real semantics of causal links in probabilistic planning.)

1) The causal links A1(i) -- w --> A2 represents a commitment that
proposition w will be true when action A2 is executed due to outcome i of
action A1. Here 'i' is one of possible several plausible outcomes of
action A1. The planner must attempt to increa se the probability that
outcome i of A1 is realized and may do so by establishing additional
causal support to satisfy the conditions that trigger i. 

Note that we can use different outcomes of the same action to increase
the support it gives to a condition C

 Suppose Action A has two outcomes:

  O1: Trigger: nil  Effect: P,Q .5

  O2: Trigger: nil  Effect: P,R .5

 We can use both outcomes of the action simultaneously to support P
with probability 1 (as against only .5 given by each individual
outcomes). 

2) Unlike classical planners, more than one causal link may be required to
support a goal proposition. BURIDAN will seek to provide support
in any permissible manner in order to achieve the target probability for
the goal action.  Causal support in a probabilistic plan is a cumulative
concept; in general, the more links supporting a proposition the more
likely it will be true. (BURIDAN makes no distinctions between open
conditions and established or closed conditions.)

3) Relevant proposition action pairs are used to implement the desired
causal support. For example if providing w to A2 is desired BURIDAN adds
<w,A2> to an agenda in the manner of the goal agenda of classical
planners. If the action A1 is added to the pl an to provide support for w
at A2 a causal link is recorded and all the propositions making up i's
trigger in A1 become relevant pairs. 


   However, unlike a classical planner such as SNLP, it is not
necessary that every item on the agenda be made true before the plan
be considered a solution. BURIDAN's agenda is termed a "relevant set"
because the entries are propositions that might be relevant to plan
success and are therefore worth considering. But a plan that
successfully achieves the goal with probability greater than the
threshold may be generated before all relevant pairs are visited.

4. Unlike SNLP and UCPOP, buridan cannot terminate based on causal
   links alone. It _has_ to assess the plan to check whether the goal
   satisfaction probability has been achieved. This is like
   Refine-Plan using the existence of a safe-ground linearization,
   rather than causal-links to do the termination. 

   Trying to find a safe ground linearization given a plan is NP-hard
in a deterministic planner such as UCPOP. It is thus at least NP-hard
for a probabilistic planner to determine whether there's a ground
linearization giving the desired goal probability.  (Of course, the
analogy is slightly thin. Since a causal link may be threatened with a
probability less than one, the notion of safe ground linearization is
not well-defined as yet for probabilistic plans. In fact, the forward
assessement criterion used by Buridan doesn't actually consider causal
links at all, and essentially simulates _all_ ground
linearizations. Thus, I conjecture that Buridan with Forward Assement
will not be systematic.)

5) Threat resolution for a probabilistic planner must extend the
classical planning techniques. The method of ordering a threatening
action (demotion or promotion) is similar, and BURIDAN can apply
confrontation in the manner of UCPOP However one differen t
confrontation technique is to plan to decrease the probability that
the harmful outcomes of a threatening action occur by ensuring the
trigger conditions of these outcomes are not satisfied.

 Note that we may be simultaneously planning to make one set of
trigger conditions of an action _hold_ to support a causal link, while
planning to make another set of trigger conditions _not hold_ to avoid
a threat.

 E.g. Suppose an action A has two outcomes:

    O1: Trigger: C1  Effects: P
    O2: Trigger: C2  Effects: !R

Now we may make C1@A a relevant condition to support P, and also make
!C2@A a relevant condition to avoid deleting R (assuming there is some
causal link supporting R that A threatens). 

 If the threatening outcome is sufficiently unlikely a plan may
 be generated that exceeds the goal threshold probability despite the potential threat.

ASSESSING PLANS

The need to efficiently compute the probability that a plan achieves its
goal is integral to a probabilistic planner. Assessing a partially ordered
plan with conditional effects is N-P hard even without considering
probabilities. The authors suggest that adding the complexity of
probabilistic calculations can result in plan assessment requiring time
that is exponential in the length of the plan. Thus a study of different
assessment approaches that exploit structure in the actions, goals and
state space wa s conducted. The result is four algorithms, each of which
computes a lower bound on the exact probability of success. The best
assessment algorithm for a given planning problem depends on the domain. 

Forward Assessment:
In this method execution of the plan is simulated and the goal probability
computed. Unfortunately the number of states so generated may grow
exponentially. 

For example, suppose we have the goal Holding(c), and generated he one
action plan Pickup(c) whichis executed from the state distribution
shown on right below. The goal satisfaction probability can be
computed from the state distribution resulting after action A:

					        __________________________
   __________________				|      _|_	    _|_   |
   |    B	     |			 	|   B | C |    B   |   |  |
   |  __A___C_  0.3  |				|  _A___     __A__C_      |
   |		     |	     Action A		|   0.27     0.03         |
   | __A_B_C_  0.6   |========================> |   		          |	
   |		     | Pickup C		        |      _|_	 _|_      |
   |    A	     |  with  Prob. 0.9         |     | C |	|   |     |
   |  __B__C_    0.1 |             -holding(C)  | _A_B____   _A_B_C_      |
   ------------------	      Prob 0.1		|   0.54      0.06        |
				   -ontable(C)	|	       	          |
					        |        _|_       _|_    |
					        |       | C |	  |   |   |
					        |    A		  A       |
						|   _B______	 _B___C__ |
						|      0.09	   0.01   |
						--------------------------

 Since C is being held in the three states on left, the probability of
goal satisfaction is .27 + .54 + .09 = 0.8. If this is greater than
the probability of satisfaction we want, we can terminate. 

Query Assessment:
This method is goal directed -it tries to articulate the state space only
when doing so is necessary to decide the state of a query proposition. The
idea is to divide an action's outcomes into equivalence classes based on
how they affect the query proposi tion, and reason about the classes
instead of about the individual outcomes. This significantly reduces the
states that must be handled. 

Network Assesment:
The two methods above both represent world states explicitly. They both
manipulate structures that represent elements of the state space. The
Network assessment algorithm dispenses with such explicit representations
and represents a states component propo sitions directly.  Here Bayesian
networks are used to assess the plan. Given the distribution of the parent
plans a random variable is independent of other variables. 

Reverse Assessment:
This assessment algorithm is based on the insight that a plan's causal
link structure captures exactly the information needed for assessment.
Essentially the goal satisfaction probability is computed in terms of the
probability that all links in the plan will get supported. For example: 

	L1_______
		 \
		 (P, Q, R  => G) 
	       A1   /  /     \
	 L2________/  /	      \-----------L0------> G
	     	     /
	   L3______ /

Here G holds if P,Q, and R hold... That is L1 & L2 & L3.  This is
transformed into a conjunction of chance variables in the initial
state and a series of subsequent actions.

Reverse assessment algorithm finds a _lower bound_ on the probability
of satisfaction, as computed by the forward assessment, since the
latter may take into account other implicit supports that are not
considerd by the causal links. This also happens in deterministic
plannig. Suppose we have to achieve goals g1 and g2, and we consider
g1 first and put in an action A which gives both g1 and g2, to support
g1. As far as the causal link based termination terst is concerned,
the resultant plan is not yet done. However, a termination test that
simulates all safe ground linearizations will terminate right away
since A does implicitly satisfy both goals (even though the causal
link is established for only one of them).


As it turns out, in the context of the entire planning process, the
fastest planning isn't necessarily done with the fastest assessment
algorithm. Some algorithms compute better bounds (closer) to the exact
probability than others and hence may terminate planning sooner than
algorithms that do quicker but cruder probability estimate. 

This should not be surpsing considering the fact that a cheaper
termination test does not necessarily improve the complexity of
Refine-Plan, since it may also simultaneously increase the depth of
the search tree. 


Extending Probabilistic Planning to Include Information Gathering and Usage

A natural extension to a planner that builds plans in an uncertain world
is to incorporate actions that gather information about that world in
order to increase the likelihood of achieving a goal. C-BURIDAN implements
such an extension to BURIDAN by allow ing actions with both informational
and causal effects. The major extensions are: 

* C-BURIDAN distinguishes between actions that "observe" a property of the
world and actions that set the value of that property in the world. The
distinction is essential for effective contingency planning. 

* The actions to be executed at any point in a C-BURIDAN plan may differ
depending on what was observed in prior observations. 

* Perfect, noisy, or biased sensors can be represented in C-BURIDAN's
probabilistic model and sensor accuracy may depend on the prevailing world
state. 

* C-BURIDAN can make use of correlated information. That is, it may plan
to gather information about a property that is related to another property
it is currently attempting to assess. 

The C-BURIDAN Planning Process

Each C-BURIDAN plan consists of a set of actions, "contexts" for each
action, a partial temporal ordering of the actions, a set of causal links,
and a set of subgoals. Here the only structure not seen in BURIDAN is the
action contexts. Each action in the plan is assigned a context which
indicates the circumstances under which the action should be executed.
(This method is essentially adopted from CNLP -Peot and Smith, Ref. 3) A
context is made up of observations obtained from previous steps in the
plan. E ach action has its possible resultant observations partitioned
into equivalence classes so that when an action is executed, it will
report exactly one of its observations to the agent. When the plan is
actually executed a step can only be applied when its context is
compatible with actual observations produced by previously executed steps. 

During refinement C-BURIDAN can apply all of the threat resolution
techniques available to BURIDAN. In addition it has a technique that is
unique to contingent planning that the authors term "branching". Branching
works by introducing branches into a plan as the result of observation
actions. The branches ensure that a threatening step cannot be executed in
the same plan trace as the producer or consumer of the threatened link.
This form of threat resolution causes the plan to diverge onto alternate
paths depending on the result of an observation(s) at the time of
execution. C-BURIDAN, unlike previous contingent planners, allows
execution plans that can diverge and then later rejoin the base plan. 

Summary of C-BURIDAN

C-BURIDAN is an implemented probabilistic contingent planner that combines
probabilistic reasoning about actions and information with
least-commitment planning methods. The effects of causal and information
gathering actions can be freely mixed and the pl anner distinguishes
between them. C-BURIDAN generates contingent plans in which different
actions are executed depending on the result of prior observations. 


Alternative Approaches to Probabilistic Planning

A significantly different approach to planning under uncertainty is
described by Dean (Ref. 4). Markov decision processes are proposed as the
basic representation and they provide the semantic foundations for
probabilistic planning in much the same way as propositional logic and its
associated semantics provide the foundations for more expressive logics.
The author claims that the BURIDAN action representation can be used to
model arbitrary Markov chains. Whereas in its current implementation
BURIDAN repr esents each action as a decision tree, they could easily be
represented using the type of factored state-transition models described
by the author. 

An earlier approach that searches for policies which maximize the
probability of achieving a goal was developed by Drummond and Bresina
(Ref. 5).Their method incrementally improves a partial policy by first
generating an initial policy corresponding to a sequence of actions that
will reach the goal with some (possibly small) probability. Then the
policy is improved by identifying states not covered by the partial policy
but likely to be encountered during execution.. This was one of the first
works to address the tradeoffs involving the quality of the constructed
policy and the cost of delay. 

Work on graphical probabilistic (Ref .6) also deals with decision making
and planning problems, but focuses more on solving a given probabilistic
or decision model. The BURIDAN method interleaves the process of
constructing and evaluating solutions. 

See more on this in the next class notes, and the relation between
MDPs and Buridan approaches. 

REFERENCES

1. [Nicholas Kushmerick, Steve Hanks, Daniel Weld]
    "An Algorithm for Probabilistic Planning " Univ. of Washington, 
     Tech Report 93-06-03, June 14, 1993

2. [Denise Draper, Steve Hanks, Daniel Weld]
    "Probabilistic Planning with Information Gathering and Contingent Execution"
    Proc. of The 2nd International Conference on Artificial Intelligence 
    Planning Systems
    pages 31-36, June 13-15, 1994

3. [M. Peot and D. Smith]
    "Conditional Nonlinear Planning"  Proc. 1st Int. Conf on A.I. Planning 
     Systems, pages 189-197, June 1992

4. [Thomas Dean]
  "Decision-Theoretic Planning and Markov Decision Processes" 
   Brown University, 1994

5. [Drummond and Bresina]
   "anytime synthetic projection: Maximizing the probability of goal 
    satisfaction"
   In Proceedings AAAI-90, AAAI, 1990, 138-144

6. [Ronald A. Howard and James E. Matheson}
   "Influence Diagrams"  
    In The Principles and Applications of Decision Analysis. 
    Strategic Decisions Group, 1984