Notes on Handling Incomplete Information November 15th and 20th classes (Submitted by Holly Coast) (minor revisions by Rao [Dec 5, 1995]) Up until now, we've been looking at planning under the assumption that the planner has complete and correct information about the initial world state. In the papers on sensing and incomplete information, the authors look at ways of doing planning when the planner's information about the world state may be incomplete; that is, some crucial piece(s) of information may not be available at plan time. In particular, the values of certain state variables may not be provided by the initial state. Of course, this discussion assumes that these values are recoverable at some point, that we can get the missing values as the result of some actions; if not, then planning isn't possible. Also, it's important to note that not ALL state variables need to be specified or recovered; we only need the values of those state variables that are relevant to our goals or to the current plan. There are essentially three ways to handle planning under incomplete information. One way is to do all necessary sensing up front - that is, determine all the state variables that are not given values in the initial state (so, for example, if there are N state variables but the initial state provides values to only M of them, you need to find the values for N-M state variables), do all the sensing to obtain these values, put the values into the initial state and THEN start planning. The problem with this approach is that you may well not know which of the missing state variables will be relevant to the plan, and you can spend a lot of time doing unnecessary sensing. If sensing isn't a cheap or trivial process (and we typically assume that it isn't), then this is a poor approach. As mentioned in Dean and Kambhampati (pg. 22-23), another approach to this situation might be to consider all the possible values relating to the incomplete information; i.e., if the value of a state variable is unknown, but we know it must be one of a set of K values, why not consider the planning problem to be a collection of K problems, one for each of the possible values of the state variable, then using these to form a K-way conditional plan? This method will be practical only if there are a very few unspecified variables, and if the domains of each are small. Obviously, if the domains are large and/or there are a lot of unspecified variables, then this method can result in a combinatorial explosion in the number of possible initial states. Furthermore, the plans for the different contingencies may have significant overlaps, and synthesizing the overlapping portions multiple times leads to wasteful computational effort. A better way of doing contingency planning is to be _demand driven_ in our approach -- continue backward chaining until you need to establish a goal which can only be given by a contingent action. At that point introduce the contingent action into the plan and make a branching plan that branches on all K possible outcomes of the contingent action. An improvement on this technique is, rather than forming K separate branches, form a two-way branch based on whether the state variable does/does not take on a specified value (for example, one branch for "the table has color red" and another for "the table does NOT have color red"). [[It could be that the plans that involve the table being blue, green, yellow, black and white would all be equivalent, and where the plan actually diverges is the one that needs the table to be red. It is more efficient to let the other five plans be combined into a single plan for the table NOT being red. This is a way of being "demand- driven" in how branching is introduced into the search tree. The "chance of success" at any branching is higher with the two-way branching, and this type of branching represents a lower level of commitment at this level in the search process. (Of course, which branch to look at first can involve consideration of the costs of actions and which you think is more likely to be true in the world - more on that in probabilistic planning!) And even if it turns out that as much subsequent branching is needed in a series of two-way branches as in the naive approach, at least the higher branching has been pushed lower down into the search process.]] The third way of handling incomplete information is to interleave sensing and execution (i.e., do some sensing up front and some during execution). This method takes the idea that, as planning progresses, you will be able to figure out that some of the missing state variables are relevant. At that point you find the values of these variables, and having found the values, you can continue with the plan. Both [Etzioni et. al.] and [Olawsky and Gini] focus on using sensing actions to obtain the missing information. [Etzioni et. al.] bring these actions directly into the plan, and construct two-way conditional plans based on the results of the sensing actions, whereas [Olawsky and Gini] interleave planning with execution, suspending planning until a needed piece of information is obtained through sensing, then resuming planning. More complete descriptions of the papers follows, along with additional comments and background material provided by Dr. Rao during the classes. =============================================================================== "An Approach to Planning with Incomplete Information" by Etzioni et. al. When considering how to deal with incomplete information during planning, four major questions arise under two main categories: I. How to model sensing: 1. How to represent information-needs goals; 2. How to represent actions whose primary function is to obtain information; 3. How to choose between an action that SENSES whether some property is true and an action that CAUSES the property to become true; and II. How to carry out sensing 4. whether classical planning algorithms can be extended to allow for information goals and sensing actions. Sensing itself doesn't change the state of the world; it only changes the internal state of the agent's knowledge about the world. Thus, we need to extend planning to allow for actions, effects and goals that are stated in terms of information needs as well as causal relationships. Action effects such as "(color (table red))" are ambiguous in planning that includes sensing actions: is this effect saying that the action MAKES the table red or that it determines that the color of the table IS ALREADY red? There must be some differentiation of effects to provide for these different meanings. In addition, goals need to come in different forms: we need to be able to represent goals of the form "Make block A red" as well as goals of the form "Make block A have the same color as block B." Painting both A and B black would violate the intent of the second goal, which was to determine the color of block B, then make block A have that same color. Etzioni et. al. have created a language, UWL, and a partial-order planning algorithm, SENSp, that handles these issues. They have chosen conditional planning as the technique used to carry out planning that involves sensing. UWL (University of Washington Language) is an extension of STRIPS. The extensions include: 1. Annotations on preconditions and postconditions to differentiate between information-gathering and causation-based goals and actions; 2. Run-time variables of the form !x (whose values will be provided during execution by sensing actions); 3. Conditional plan steps that use the run-time obtained information; and 4. The concept of a truth-value of U (unknown) for a proposition, in additon to T (true) and F (false). *** ANNOTATIONS *** POSTCONDITION annotations that Etzioni et. al. use are either "observe" or "cause." (In general, "observe" effects are sometimes also represented as "knows(P)" in planning.) Cause postconditions are of the form (cause (P . T)) or (cause (P . F)), where P stands for some proposition. These correspond to the types of postconditions we've been dealing with all semester - they are an operator's effects which change the world state. They correspond to STRIPS' adds (first example above) and deletes (second example above). Causal effects contain no run-time variables. In the Etzioni paper, there is an implied corresponding observation that results from every cause effect. That is, if you cause something to be true, it is assumed that you also KNOW that you did so (i.e, "your right hand knows what your left hand is doing"). Dr. Rao mentioned that this might not necessarily always be true in planners that deal with incomplete information. Observe postconditions change the planner's state of information. Through observational effects, the planner can determine the truth value of a proposition: (observe (P . !v)); or it can identify an object that has a particular property or determine a certain property of an object: (observe ((P !x) . T)). Observe effects use run-time variables, which are essentially free variables. Since ordinary free variables don't have any semantics in logic, we give the run-time variables "existential semantics" - that is, we make them Skolem variables, which will take on some value as a function of the objects in the domain. Skolem variables are treated differently from ordinary variables: Skolem variables can't unify with a constant, nor can they unify with each other. They are allowed to unify with ordinary variables. Including existential-semantics variables corresponds to disjunctive effects, which we haven't seen up until now (UCPOP allows disjunctive preconditions but not disjunctive effects), because they correspond to nondeterminism. However, the nondeterminism is with respect to the agent's knowledge state, NOT to the world state. The world is still deterministic. Information obtained at run time through observational effects can be used to construct a two-way conditional plan. For example, from (observe (P . !v)), the planner can build a plan conditional on the value of the run-time variable !v with the construct (pcond P P1 P2) (P is the same proposition P as in the observe effect, P1 and P2 are plans) - if !v becomes T (i.e., proposition P is true) at execution time, plan P1 will be executed; otherwise, plan P2 will be executed. The second form of observe effect, (observe ((P !x) . T)), can be used to build a conditional plan using the construct (vcond (!x = K) P1 P2) - plan P1 will be executed if, during execution, the run-time variable !x is bound to the constant K; otherwise, plan P2 will be executed. By the way, a single action can have both cause and observe effects. How cause and observe effects can be used depends on the rules relating to the types of the goals (i.e., information needs or causal goals). Etzioni et. al. use annotations on preconditions to govern these rules. PRECONDITION annotations are "satisfy," "find-out" and "hands-off." Satisfy preconditons can be achieved through either causal or observational effects. For example, the goal "(satisfy (color (table red)) . T)" can be achieved either by painting it red ("(cause (color (table red)) . T)") or by finding out that the table is already red ("(observe (color (table !c)) . T)" and !c becomes bound to red). The choice might depend on the cost of sensing vs. the cost of causing the condition to be true. If you're a planning agent, then painting the table red is easier than trying to find out what the table's color is. Both plans (one with a sensing action to achieve the "satisfy" precondition, one with a causing action) will be correct in the classical sense. These decisions can be backtracked over: imagine that the causing action is chosen - you decide to paint the table red - but one of its preconditions is that you have a brush, and you don't have a brush, nor is there any action that can give you a brush. Now, instead of giving up and saying the plan is dead, just backtrack over the decision to paint the table, and instead find out through a sensing action if maybe the table is already red. Find-out preconditions refer to information needed - we want to find out whether some proposition is true without changing its state in the process of this "finding-out." Thus, a find-out goal can be achieved either through an observe effect, or through a causal effect if the action that gave the causal effect serves some other useful purpose in the plan. This "other useful purpose" can be achieving some satisfy goal, or achieving some other find-out goal via an OBSERVE effect. For example, suppose one of your goals is (informally) "Everything in this room has been painted red," and another goal is "Find out if the table in this room is red." If nothing constrains the second goal to be achieved before the first, then the action that supports the first goal can be used to also support the second goal (I just painted everything in the room red to support the first goal, so from that I know that the table is red). However, if that first goal weren't there (i.e., if we only had "Find out if the table is red"), then we could not directly achieve that goal by painting the table red and then saying, "Yes, the table is red." From this discussion we can glean the rule that we can do SIMPLE establishment between a causal effect and a find-out goal, but we are not allowed to do STEP ADDITION establishment from a causal effect to a find-out goal. (Of course, we are allowed to do step addition establishment from an observe effect to a find-out goal!) This relates to the idea of "filter conditions" that has been around in planning for a long time. Filter conditions are those conditions that must hold before you can take an action, but you are not allowed to make them true. The example Dr. Rao gave was, suppose you have a goal of going from Phoenix to San Francisco. One action in such a plan might be to take an airplane, but that makes sense only if there is an airport. Having an airport isn't a real precondition - you're not allowed to build one from scratch just so you can take a plane. Thus, having an airport is a "filter condition" - use this operator (take a plane) WHEN there is an airport. And if there IS an airport, there are additional, ordinary preconditions (I must be at the airport, I must have a ticket, etc.) that can be supported by other actions (take a taxi, buy a ticket, etc.). Filter conditions relate to achieving find-out goals through causal effects - I'm not allowed to add a new action to support it, but if it's already there (maybe as a "side-effect" of some earlier action), I can use it. When there are filter conditions, goal ordering matters. Using the two goals above (about the red room), suppose you pick the find-out goal first, and your sensor is broken - then the plan fails. However, if you first painted the room red to achieve the other goal, then you can achieve the find-out goal. (This relates to the idea of Point Truth Constraints, and the fact that these don't go onto the agenda. Just because a PTC isn't satisfied, that doesn't mean it won't be later, so instead of pruning it, just blacklist it.) Hands-off preconditions (e.g., (hands-off P)) prevent the planner from doing anything to change P's state. (The SENSp algorithm implements these by adding a causal link from the plan's initial state to its goal state.) Hand-off preconditions constitute goals of maintenance. Because of these extensions to STRIPS, the authors have devised their own definitions relating to the correctness of plans which take into account the semantics of the annotations. See definitions 2 through 6 in the full paper for the specific meanings of how two propositional contents can "match," supported and valid preconditions, and correct plans. THE SENSp ALGORITHM and Conditional Planning in general SENSp is an IMPLEMENTED partial-order planning algorithm for UWL that is based on SNLP but has several extensions to SNLP. The authors claim it is provably sound, and complete as long as operators with observe postconditions have no preconditions (this eliminates subgoaling to achieve observational steps). Some SENSp details: How SENSp handles CAUSAL LINKS: * Only cause postconditions are considered as possible threats to causal links. Observe postconditions pose no threat. * As mentioned above, hands-off goals are implemented as a causal link from the plan's initial state to its goal state. * Find-out goals are implemented as two branches, one for achieving it via an observe effect, and the other for using a cause effect. SENSp separately considers these two branches, backtracking to ensure completeness. In the branch for using a cause effect, special rules are followed: "... the algorithm first tries to add causal links to support all preconditions without find-out annotations (adding new steps to the plan if necessary). Then it tries to add links supporting all remaining goals without adding new steps to the plan." Run-time variables are considered as constants with as-yet-unknown values; there are matching restrictions imposed on them as described above for Skolem variables. In addition, if a run-time variable matches with an ordinary variable, an ordering constraint is added so that the value of the run-time variable is not used before it is observed. SENSp will insert a conditional into a plan only when it needs to constrain the value of a run-time variable, and it must do so in a way that obeys the requirement that the run-time variable be observed before being used. The algorithm is a conditional planner. It places sensing actions into the plan and then plans for the different possible outcomes (i.e., for the possible values of run-time variables) so that, when an information gathering action is executed, the plan already "knows" what the next action should be based on the results of the sensing action. The plan is essentially a tree structure, with branches below the sensing actions. A complete execution corresponds to a branch in the tree plan. In general, as mentioned earlier, instead of taking the naive approach to conditional planning (which forms a K-way conditional branch), an algorithm could form two-way branches based on the truth/falsity of some condition. This is a demand-driven technique that takes place as part of the backward reasoning from the goal step. To establish a conditional link in which some variable's value is constrained to be a specific constant, we also must consider that the variable's value might NOT be that constant, and planning must be done with respect to that possibility. A copy is made of the goal step, and assign to each of the goal steps the corresponding CONTEXT - if in one branch the conditional value is (color (table red)), then the goal step in that branch has that context, whereas the goal step in the other branch has the context (not (color (table red))). In addition, the steps in a branch below a sensing action also have the corresponding context (contexts are propagated forward in the plan). A step in the tree-structured plan will be executed as long as the world state contains the context of that step - that is, the steps context is a subset of the world state. (Of course, any step whose context is null will always be executed - this corresponds to steps in the plan that occur before any sensing actions.) Any two contexts are said to be compatible if one is a subset of the other. If they are not compatible, there is no possible way that both actions can be executed in any legal execution of the plan. This idea is used to allow the planner to recognize that a step in one branch of the plan can't violate a causal link in another branch. This is called "conditioning" of steps - we can avoid handling an interaction between a causal link and a step if their contexts are incompatible. In the example in Dr. Rao's overheads, there is a "remove(tire)" action to achieve the "clear(hub)" precondition of "puton(spare)". "Remove(tire)" threatens the causal link of the start step giving "on(x)" to the goal step (which has the context "(status (x, intact))"); this threat can be resolved by putting a conditional link from the check(tire) action to the "remove(tire)" action with the context "not (status (x, intact))". Now the contexts are incompatible, and the link is no longer threatened. Also in the flat tire example, it's important to realize that the planner could have first considered putting the spare on (as long as there's no precondition that says you can only do this if the flat tire is NOT intact). If the planner had taken this approach, then the resulting plan would not have been a conti- tional plan. Also, suppose there were no "check(tire)" action - then the problem can still be solved by changing the tire, or by buying four new tires. These are still correct plans; they just may not be the optimal ones. PRACTICAL APPLICATION The UNIX domain often gives rise to information goals and sensory actions as well as actions that change the world state. In fact, the authors' development of UWL was originally motivated by a project of building a softbot for UNIX. The softbot uses UWL to represent its actions (UNIX commands) and goals - both information-gathering and causation goals (there are several examples in the paper). The authors point out that UWL as it stands does not address certain issues that arise in representing UNIX commands, such as the need for universally quantified preconditions and postconditions, or the fact that other agents are continually changing the world's state. The authors are currently working to extend UWL to address these issues (or at least were at the time of this paper!). =============================================================================== "Deferred Planning and Sensor Use" by Duane Olawsky and Maria Gini Here is an example of the second technique of handling informational actions in planning: instead of forming conditional plans, sensing actions in the plan are executed as soon as they are brought into the plan. Once the needed information is obtained, planning continues. This is, in general, considered a reasonable thing to do, since information-gathering actions aren't expected to drastically change anything else. Of course, in resource-limited planning, some amount of time and/or cost is involved in executing the action, and this may cause a violation of other time constraints on the planning process. However, in general, information-gathering actions are certainly more reasonable to execute before the plan is complete than other types of actions. This paper approaches the incomplete information problem from the perspective of robotics. The ability to incorporate sensory data into the planning process is necessary for an autonomous robot to be able to operate under general goals for extended periods of time, realizing what information it needs and what it must find out through sensor use. The authors have chosen to use conventional planning techniques as a basis for their study, and they examine two possible ways of using these techniques to solve problems with incompletely specified initial states: extending the techniques in some way, or defining new operators at a level of abstraction that allows the conventional planner to handle such problems. They give an example of the second method and show that it leads to nondeterministic effects, which is a problem for conventional planners. Thus they opt instead to extend conventional planning techniques to account for incomplete information. The three basic ways of doing this are: 1) Plan for all contingencies (i.e., for all possible sensor reading values); 2) Find a single plan based on an assumed value of the sensor reading; or 3) Defer planning decisions that depend on a sensor reading until the reading is available. The first approach is similar to the "K-way" conditional plan mentioned in the introduction to this summary; it could easily result in an exponential growth in the amount of planning required. Even so, this approach may be appropriate (if computationally feasible) in cases where: a) The cost of the plan can be justified by its long-term usefulness (there is a high expectation of reuse); b) Execution-time planning (deferred or replanning) is impossible or undesirable due to time constraints during execution; and/or c) Plan errors are highly critical, so that the cost of extra planning is outweighed by the cost of a mistake. In the second approach, it is obvious that if the wrong assumption was made regarding the sensor reading's value, then the plan is invalid, and replanning must take place. Thus, this approach is most appropriate only when all of the following are true: a) It is acceptable for the robot to stop during execution to allow for replanning; b) Plan errors are NOT highly critical - actions are reversible, or the cost of failure is small. c) Some specific sensor reading value is more likely to occur than any other. In the third approach, the planner only completes those portions of the plan for which it has enough information. This may lead to important dependencies or constraints being missed, which in turn may lead to an action taken that must be undone. This approach is appropriate when (a) and (b) mentioned for the second approach are true. The authors have developed an agenda-controlled planner called BUMP (Basic University of Minnesota Planner) that uses STRIPS-style operators and builds a plan consisting of goal nodes and process nodes. BUMP works under the direction of an Execution Controller (EC) that invokes BUMP when planning is called for, then simulates execution of the steps of the (possibly partial) plan that BUMP returns. Thus, if a piece of information is missing, the BUMP plan must contain a request for a sensor reading via a sensor-process node. The EC interprets this node as an instruction to solicit (via the robot) a sensor reading at this point in the execution. As soon as the sensor reading takes place, BUMP is restarted with the new information, makes additional plan decisions and, when all goals in the plan have been either achieved or deferred, returns a new plan to the EC. At this point the EC begins executing the (possibly partial) plan, preferring sensor processes over other parallel processes. In turn, the EC will then first choose to execute actions that are "set-up" actions to sensor processes or that are constrained to be ordered before sensor processes. This process (planning, executing, sensing, planning, executing, sensing, etc.) continues until all necessary sensing is done and BUMP returns a complete plan. The EC then finishes the execution of it. Sensor processes are represented essentially the same as other processes; they have preconditions and an add and delete list. The preconditions are used to generate the "set-up" actions for the sensor, and at least one of the add list items will be the information (reading) obtained by the sensor. Sensor processes are added to the plan ONLY when the needed information is not already available in the plan (i.e., if it hasn't already been asserted in the initial state or by some process node that can occur before the goal node that needs the information). In deciding what types of goals to defer, the authors came up with two basic strategies, plus a third strategy that is a domain-specific variation of one of the two basic strategies. The basic strategies are: 1) CONTINUE ELSEWHERE: This method defers only those goals that are defined in terms of data that must be obtained through a sensor reading. This approach preplans as much as possible before returning the current plan to the EC. 2) STOP AND EXECUTE: As soon as BUMP reaches a goal defined in terms of a sensor reading, it stops immediately, deferring ALL remaining goals until the sensor reading is obtained, and returns the current plan to the EC. The authors' study involved a set of 32 problems from a "two-box" version of the toolbox domain: there are two toolboxes, two wrenches and two bolts of different sizes, each of which matches the size of one of the wrenches. The goal is to bolt one or both of the toolboxes shut with particular bolts. In this study, the Stop and Execute strategy failed on only two of the 32 cases, performing better than the Continue Elsewhere method, which failed on four cases. They also tested their third, domain-specific, strategy (based on Continue Elsewhere), which they call "Sense before Closing." This strategy attempts to order all sensor processes before all "Close-Toolbox" processes. Sense before Closing performed as well as Stop and Execute. The authors also describe a new project in which they extend the test problem to a three-toolbox version. Preliminary results indicate that the Continue Elsewhere method will outperform the Stop and Execute method in more complex problems, due to Stop and Execute's inability to plan more than one sensor process ahead. Also, indications are that neither of the general-purpose strategies are very good at avoiding failures in complex problems, and more domain-specific strategies (such as Sense before Closing) may be necessary. By the way, both of these papers present interesting discussions of and pointers to related work in these areas. ========================================================================== We have just scratched the surface of planning with incomplete information. Some other issues involved in sensing are listed below: Since sensing can be costly and take time, it is important to try to reduce the overall cost of sensing during planning. Don't sense redundantly - if an action with an observe effect was executed earlier, can you use the results of that action to establish a find-out goal (rather than resensing)? On the other hand, it may be that the cost of finding out what you do know (e.g., searching a database to determine what has been observed) is higher than the cost of resensing; likewise, things may have changed since the last sensing, and resensing would be necessary anyway. You should keep track of those aspects of the world about which you have locally closed-world information; in those aspects of the world, you don't need to do sensing actions - a universally quantified precondition can be supported by a universally quantified effect. Here again, though, you can lose the closed-world through some other action, and may need to resense or do a "promotion or demotion" on the other action to regain closed-world information.