Probabilistic Planning (Updated with the class notes of 11/27 & 11/29) Terry Zimmerman CSE 591 Planning & Learning Nov. 1995 Revised by Rao [Dec 5, 1995] Classical planning assumes deterministic actions. In the "real world" planning must generally take place in a non deterministic environment where key information needed to make decisions may be unavailable, incomplete, or inaccurate. In addition actions m ay have multiple possible outcomes. If you know or can estimate the probabilities of the various outcomes you have a basis to proceed with planning. (On the other hand if you don't know anything about the possible outcomes of an action there's not much po int in doing planning...) In the way of an example, if you must remember to pickup a turkey on the way home today in order to make a Thanksgiving dinner you might make a mental note to yourself. However considering your deteriorating memory lately and the importance of keeping fam ily and guests content you may judge that a mental note alone is inadequate insurance that you'll succeed at your goal. So you may write yourself a note and post it on your office door as a backup to your memory. If you're already held responsible for a t urkey-less Thanksgiving last year you may judge that even this action may not adequately insure success so you may tie a string around your finger as a reminder. Another action in your Thanksgiving plan is finding a turkey at this late date at the grocers. Consideration of the possibility that the first market you visit could be sold out may lead you to plan to leave a half hour early, to improve your chances of s uccess. This type of planning is markedly different than the process and environment associated with classical planning where complete information of the world state is available and all effects of actions are known. The term "probabilistic planning" as used in t his paper refers to any of a number of approaches to planning intelligently in a stochastic world with incomplete knowledge. The primary focus will be on the methodology presented in "An Algorithm for Probabilistic Planning" -Kushmerick, et. al. (Ref. 1) and "Probabilistic Planning with Information Gathering and Contingent Execution" -Draper, et. al. (Ref. 2.). However the relationship of this method to others explored in the literature (including Markov decision processes) will be discussed at the end of the paper. Ref. 1 describes the assumptions, world state and action representations and an algorithm that models probabilistic planning in a fairly intuitive way. The algorithm is implemented as the BURIDAN planner and it is shown to be both sound and complete The Approach to Modeling Uncertainty The two key issues in probabilistic planning are representation and reasoning/plan synthesis. REPRESENTATION In the absence of deterministic world knowledge a reasonable approach is to represent the possible world states as a probability distribution at any given point in time. The outcomes of actions too are not entirely predictable and hence can be described in terms of conditional probability distributions. This has the effect of complicating the definition of a successful plan. The chosen approach here is to terminate planning when a plan is generated that achieves the goal with a likelihood exceeding a user-supplied probability threshold. Representing States: Even the initial state is uncertain and, as such, it can be represented as a distribution over states. The sum of the probabilities for the discrete states in the distribution must sum to 1.0 (a special case is where the distribution has probability 1 for one state). Each state succeeding the initial state is therefore also represented as a probability distribution. Example in blocks world (with blocks A,B,C): B A States=> __A___C_ __A_B_C_ __B__C_ Prob. => 0.3 0.6 0.1 Representing Actions: Actions, then, can be represented as mappings from one distribution to another. __________________________ __________________ | _|_ _|_ | | B | | B | C | B | | | | __A___C_ 0.3 | | _A___ __A__C_ | | | Action A | 0.27 0.03 | | __A_B_C_ 0.6 |========================> | | | | Pickup C | _|_ _|_ | | A | with Prob. 0.9 | | C | | | | | __B__C_ 0.1 | -holding(C) | _A_B____ _A_B_C_ | ------------------ Prob 0.1 | 0.54 0.06 | -ontable(C) | | | _|_ _|_ | | | C | | | | | A A | | _B______ _B___C__ | | 0.09 0.01 | -------------------------- Given that actions transform one distribution of states to another distribution, asking if a given state achieves the goal doesn't make any sense. Goal satisfaction becomes a probabilistic issue -does the probability that the goal is achieved exceed a sp ecified level. For example, given the final state distribution, determine how many of the states have the goal true in them, sum the probabilities of all those states, and determine if the sum is greater than the target goal probability. Two major means of representing actions are as state mappings or in STRIPS style. State mappings is the usual encoding of Markov Decision Processes. Given n state variables and m possible values for each there are m**n possibilities. This is essentially like a big DFA. In representing actions in STRIPS style only the factors that change are given and the complete state is implicitly computed. This is a more compact representation than state mappings if each action changes only a small portion of the state. The methodology presented here deals primarily with STRIPS style representations. BURIDAN Standard STRIPS representation allows an action to be applied only if its preconditions are satisfied when the action is executed. BURIDAN uses a modified STRIPS model where actions have multiple possible outcomes and can be executed in any world state. T he actions' effects, then, depend on the world state at the time of execution and, to some degree, random chance. And unlike the case for classical planning, since the world at any time is characterized as a probability distribution over possible states (a state distribution) BURIDAN's actions cause a transition from one state distribution to another. The actions may be visualized as binary trees where the leaves are the actions' effects given the world state at the time action is executed and the impact of random chance. For example, a 'Bring-Home-Turkey' action might be represented as shown in Fig. 1 , if we consider only your weak memory (and of course chance) as relevant to the outcomes. ............................................................................ FIGURE 1 Bring-Home-Turkey / \ Note Posted / \ No Note Posted / \ / \ / \ ----------- ----------- c1(prob=0.8) / \~c1 c2(prob=o.4)/ \ ~c2 / \ / \ Have-Turkey (No change) Have-Turkey (No change) ............................................................................... This can be written symbolically as: BT outcome 1: Trigger: note posted Effects: (have turkey) Prob: .8 outcome 2: Trigger: note posted Effects: nil Prob: .2 Outcome 3: Trigger: no note posted Eff: have-turkey Prob: .4 outcome 4: Trigger: no note posted Eff: nil Prob: .6 Here we reflect the fact that if a note has been posted in your office there's an 80% probability that you'll remember to stop at the market for a turkey (and thus a 20% chance that you won't). If no note has been posted there's only a 40% chance that you 'll remember to stop. It must be emphasized that no probabilities are being directly assigned to action outcome propositions without considering the state of the world at execution time. Note also that the actions don't have any equivalent of executability preconditions of SNLP/UCPOP. An action will always _execute_ irrespective of the state that it is applied in. The outcomes of the action will depend on whether certain trigger conditions hold. The BURIDAN Planning Process In general a probabilistic planning algorithm must generate a sequence of actions such that starting from an initial state distribution and executing each action in turn results in a final state distribution in which the goal expression holds with probabi lity greater than some user-supplied value. BURIDAN searches through plan space until it finds one that achieves the goal with sufficient probability (as defined by the user) Although, with the action semantics and goal satisfaction semantics discussed above, ANY planning approach (FSS,BSS,etc.) c an be extended to solve these problems. Plan representations and planning in the BURIDAN system differs from those of classical partial order plans in 4 major respects: In reading these differences, note that UNLIKE SNLP and UCPOP, and like the general Refine-Plan algorithm, BURIDAN terminates using a global assessment computation that is independent of the refinement process. Thus refinement need not continue until all conditions are supported and all threats are resolved. (Further more, unlike even Refine-Plan, Buridan's assessment algorithm doesn't strictly care whether the causal links it established are "holding" or not. In other words, the refinement process doesn't exactly split the candidate sets of the plan. I admit that it is not clear to me as to what should be the real semantics of causal links in probabilistic planning.) 1) The causal links A1(i) -- w --> A2 represents a commitment that proposition w will be true when action A2 is executed due to outcome i of action A1. Here 'i' is one of possible several plausible outcomes of action A1. The planner must attempt to increa se the probability that outcome i of A1 is realized and may do so by establishing additional causal support to satisfy the conditions that trigger i. Note that we can use different outcomes of the same action to increase the support it gives to a condition C Suppose Action A has two outcomes: O1: Trigger: nil Effect: P,Q .5 O2: Trigger: nil Effect: P,R .5 We can use both outcomes of the action simultaneously to support P with probability 1 (as against only .5 given by each individual outcomes). 2) Unlike classical planners, more than one causal link may be required to support a goal proposition. BURIDAN will seek to provide support in any permissible manner in order to achieve the target probability for the goal action. Causal support in a probabilistic plan is a cumulative concept; in general, the more links supporting a proposition the more likely it will be true. (BURIDAN makes no distinctions between open conditions and established or closed conditions.) 3) Relevant proposition action pairs are used to implement the desired causal support. For example if providing w to A2 is desired BURIDAN adds to an agenda in the manner of the goal agenda of classical planners. If the action A1 is added to the pl an to provide support for w at A2 a causal link is recorded and all the propositions making up i's trigger in A1 become relevant pairs. However, unlike a classical planner such as SNLP, it is not necessary that every item on the agenda be made true before the plan be considered a solution. BURIDAN's agenda is termed a "relevant set" because the entries are propositions that might be relevant to plan success and are therefore worth considering. But a plan that successfully achieves the goal with probability greater than the threshold may be generated before all relevant pairs are visited. 4. Unlike SNLP and UCPOP, buridan cannot terminate based on causal links alone. It _has_ to assess the plan to check whether the goal satisfaction probability has been achieved. This is like Refine-Plan using the existence of a safe-ground linearization, rather than causal-links to do the termination. Trying to find a safe ground linearization given a plan is NP-hard in a deterministic planner such as UCPOP. It is thus at least NP-hard for a probabilistic planner to determine whether there's a ground linearization giving the desired goal probability. (Of course, the analogy is slightly thin. Since a causal link may be threatened with a probability less than one, the notion of safe ground linearization is not well-defined as yet for probabilistic plans. In fact, the forward assessement criterion used by Buridan doesn't actually consider causal links at all, and essentially simulates _all_ ground linearizations. Thus, I conjecture that Buridan with Forward Assement will not be systematic.) 5) Threat resolution for a probabilistic planner must extend the classical planning techniques. The method of ordering a threatening action (demotion or promotion) is similar, and BURIDAN can apply confrontation in the manner of UCPOP However one differen t confrontation technique is to plan to decrease the probability that the harmful outcomes of a threatening action occur by ensuring the trigger conditions of these outcomes are not satisfied. Note that we may be simultaneously planning to make one set of trigger conditions of an action _hold_ to support a causal link, while planning to make another set of trigger conditions _not hold_ to avoid a threat. E.g. Suppose an action A has two outcomes: O1: Trigger: C1 Effects: P O2: Trigger: C2 Effects: !R Now we may make C1@A a relevant condition to support P, and also make !C2@A a relevant condition to avoid deleting R (assuming there is some causal link supporting R that A threatens). If the threatening outcome is sufficiently unlikely a plan may be generated that exceeds the goal threshold probability despite the potential threat. ASSESSING PLANS The need to efficiently compute the probability that a plan achieves its goal is integral to a probabilistic planner. Assessing a partially ordered plan with conditional effects is N-P hard even without considering probabilities. The authors suggest that adding the complexity of probabilistic calculations can result in plan assessment requiring time that is exponential in the length of the plan. Thus a study of different assessment approaches that exploit structure in the actions, goals and state space wa s conducted. The result is four algorithms, each of which computes a lower bound on the exact probability of success. The best assessment algorithm for a given planning problem depends on the domain. Forward Assessment: In this method execution of the plan is simulated and the goal probability computed. Unfortunately the number of states so generated may grow exponentially. For example, suppose we have the goal Holding(c), and generated he one action plan Pickup(c) whichis executed from the state distribution shown on right below. The goal satisfaction probability can be computed from the state distribution resulting after action A: __________________________ __________________ | _|_ _|_ | | B | | B | C | B | | | | __A___C_ 0.3 | | _A___ __A__C_ | | | Action A | 0.27 0.03 | | __A_B_C_ 0.6 |========================> | | | | Pickup C | _|_ _|_ | | A | with Prob. 0.9 | | C | | | | | __B__C_ 0.1 | -holding(C) | _A_B____ _A_B_C_ | ------------------ Prob 0.1 | 0.54 0.06 | -ontable(C) | | | _|_ _|_ | | | C | | | | | A A | | _B______ _B___C__ | | 0.09 0.01 | -------------------------- Since C is being held in the three states on left, the probability of goal satisfaction is .27 + .54 + .09 = 0.8. If this is greater than the probability of satisfaction we want, we can terminate. Query Assessment: This method is goal directed -it tries to articulate the state space only when doing so is necessary to decide the state of a query proposition. The idea is to divide an action's outcomes into equivalence classes based on how they affect the query proposi tion, and reason about the classes instead of about the individual outcomes. This significantly reduces the states that must be handled. Network Assesment: The two methods above both represent world states explicitly. They both manipulate structures that represent elements of the state space. The Network assessment algorithm dispenses with such explicit representations and represents a states component propo sitions directly. Here Bayesian networks are used to assess the plan. Given the distribution of the parent plans a random variable is independent of other variables. Reverse Assessment: This assessment algorithm is based on the insight that a plan's causal link structure captures exactly the information needed for assessment. Essentially the goal satisfaction probability is computed in terms of the probability that all links in the plan will get supported. For example: L1_______ \ (P, Q, R => G) A1 / / \ L2________/ / \-----------L0------> G / L3______ / Here G holds if P,Q, and R hold... That is L1 & L2 & L3. This is transformed into a conjunction of chance variables in the initial state and a series of subsequent actions. Reverse assessment algorithm finds a _lower bound_ on the probability of satisfaction, as computed by the forward assessment, since the latter may take into account other implicit supports that are not considerd by the causal links. This also happens in deterministic plannig. Suppose we have to achieve goals g1 and g2, and we consider g1 first and put in an action A which gives both g1 and g2, to support g1. As far as the causal link based termination terst is concerned, the resultant plan is not yet done. However, a termination test that simulates all safe ground linearizations will terminate right away since A does implicitly satisfy both goals (even though the causal link is established for only one of them). As it turns out, in the context of the entire planning process, the fastest planning isn't necessarily done with the fastest assessment algorithm. Some algorithms compute better bounds (closer) to the exact probability than others and hence may terminate planning sooner than algorithms that do quicker but cruder probability estimate. This should not be surpsing considering the fact that a cheaper termination test does not necessarily improve the complexity of Refine-Plan, since it may also simultaneously increase the depth of the search tree. Extending Probabilistic Planning to Include Information Gathering and Usage A natural extension to a planner that builds plans in an uncertain world is to incorporate actions that gather information about that world in order to increase the likelihood of achieving a goal. C-BURIDAN implements such an extension to BURIDAN by allow ing actions with both informational and causal effects. The major extensions are: * C-BURIDAN distinguishes between actions that "observe" a property of the world and actions that set the value of that property in the world. The distinction is essential for effective contingency planning. * The actions to be executed at any point in a C-BURIDAN plan may differ depending on what was observed in prior observations. * Perfect, noisy, or biased sensors can be represented in C-BURIDAN's probabilistic model and sensor accuracy may depend on the prevailing world state. * C-BURIDAN can make use of correlated information. That is, it may plan to gather information about a property that is related to another property it is currently attempting to assess. The C-BURIDAN Planning Process Each C-BURIDAN plan consists of a set of actions, "contexts" for each action, a partial temporal ordering of the actions, a set of causal links, and a set of subgoals. Here the only structure not seen in BURIDAN is the action contexts. Each action in the plan is assigned a context which indicates the circumstances under which the action should be executed. (This method is essentially adopted from CNLP -Peot and Smith, Ref. 3) A context is made up of observations obtained from previous steps in the plan. E ach action has its possible resultant observations partitioned into equivalence classes so that when an action is executed, it will report exactly one of its observations to the agent. When the plan is actually executed a step can only be applied when its context is compatible with actual observations produced by previously executed steps. During refinement C-BURIDAN can apply all of the threat resolution techniques available to BURIDAN. In addition it has a technique that is unique to contingent planning that the authors term "branching". Branching works by introducing branches into a plan as the result of observation actions. The branches ensure that a threatening step cannot be executed in the same plan trace as the producer or consumer of the threatened link. This form of threat resolution causes the plan to diverge onto alternate paths depending on the result of an observation(s) at the time of execution. C-BURIDAN, unlike previous contingent planners, allows execution plans that can diverge and then later rejoin the base plan. Summary of C-BURIDAN C-BURIDAN is an implemented probabilistic contingent planner that combines probabilistic reasoning about actions and information with least-commitment planning methods. The effects of causal and information gathering actions can be freely mixed and the pl anner distinguishes between them. C-BURIDAN generates contingent plans in which different actions are executed depending on the result of prior observations. Alternative Approaches to Probabilistic Planning A significantly different approach to planning under uncertainty is described by Dean (Ref. 4). Markov decision processes are proposed as the basic representation and they provide the semantic foundations for probabilistic planning in much the same way as propositional logic and its associated semantics provide the foundations for more expressive logics. The author claims that the BURIDAN action representation can be used to model arbitrary Markov chains. Whereas in its current implementation BURIDAN repr esents each action as a decision tree, they could easily be represented using the type of factored state-transition models described by the author. An earlier approach that searches for policies which maximize the probability of achieving a goal was developed by Drummond and Bresina (Ref. 5).Their method incrementally improves a partial policy by first generating an initial policy corresponding to a sequence of actions that will reach the goal with some (possibly small) probability. Then the policy is improved by identifying states not covered by the partial policy but likely to be encountered during execution.. This was one of the first works to address the tradeoffs involving the quality of the constructed policy and the cost of delay. Work on graphical probabilistic (Ref .6) also deals with decision making and planning problems, but focuses more on solving a given probabilistic or decision model. The BURIDAN method interleaves the process of constructing and evaluating solutions. See more on this in the next class notes, and the relation between MDPs and Buridan approaches. REFERENCES 1. [Nicholas Kushmerick, Steve Hanks, Daniel Weld] "An Algorithm for Probabilistic Planning " Univ. of Washington, Tech Report 93-06-03, June 14, 1993 2. [Denise Draper, Steve Hanks, Daniel Weld] "Probabilistic Planning with Information Gathering and Contingent Execution" Proc. of The 2nd International Conference on Artificial Intelligence Planning Systems pages 31-36, June 13-15, 1994 3. [M. Peot and D. Smith] "Conditional Nonlinear Planning" Proc. 1st Int. Conf on A.I. Planning Systems, pages 189-197, June 1992 4. [Thomas Dean] "Decision-Theoretic Planning and Markov Decision Processes" Brown University, 1994 5. [Drummond and Bresina] "anytime synthetic projection: Maximizing the probability of goal satisfaction" In Proceedings AAAI-90, AAAI, 1990, 138-144 6. [Ronald A. Howard and James E. Matheson} "Influence Diagrams" In The Principles and Applications of Decision Analysis. Strategic Decisions Group, 1984