Notes on Handling Incomplete Information
November 15th and 20th classes
(Submitted by Holly Coast)
(minor revisions by Rao [Dec  5, 1995])


Up until now, we've been looking at planning under the assumption that
the planner has complete and correct information about the initial
world state.  In the papers on sensing and incomplete information, the
authors look at ways of doing planning when the planner's information
about the world state may be incomplete; that is, some crucial
piece(s) of information may not be available at plan time.  In
particular, the values of certain state variables may not be provided
by the initial state.

Of course, this discussion assumes that these values are recoverable
at some point, that we can get the missing values as the result of
some actions; if not, then planning isn't possible.  Also, it's
important to note that not ALL state variables need to be specified or
recovered; we only need the values of those state variables that are
relevant to our goals or to the current plan.

There are essentially three ways to handle planning under incomplete
information.

One way is to do all necessary sensing up front - that is, determine
all the state variables that are not given values in the initial state
(so, for example, if there are N state variables but the initial state
provides values to only M of them, you need to find the values for N-M
state variables), do all the sensing to obtain these values, put the
values into the initial state and THEN start planning.  The problem
with this approach is that you may well not know which of the missing
state variables will be relevant to the plan, and you can spend a lot
of time doing unnecessary sensing.  If sensing isn't a cheap or
trivial process (and we typically assume that it isn't), then this is
a poor approach.

As mentioned in Dean and Kambhampati (pg. 22-23), another approach to
this situation might be to consider all the possible values relating
to the incomplete information; i.e., if the value of a state variable
is unknown, but we know it must be one of a set of K values, why not
consider the planning problem to be a collection of K problems, one
for each of the possible values of the state variable, then using
these to form a K-way conditional plan?

This method will be practical only if there are a very few unspecified
variables, and if the domains of each are small.  Obviously, if the
domains are large and/or there are a lot of unspecified variables,
then this method can result in a combinatorial explosion in the number
of possible initial states. Furthermore, the plans for the different
contingencies may have significant overlaps, and synthesizing the
overlapping portions multiple times leads to wasteful computational
effort. 

A better way of doing contingency planning is to be _demand driven_ in
our approach -- continue backward chaining until you need to establish
a goal which can only be given by a contingent action. At that point
introduce the contingent action into the plan and make a branching
plan that branches on all K possible outcomes of the contingent
action. An improvement on this technique is, rather than forming K
separate branches, form a two-way branch based on whether the state
variable does/does not take on a specified value (for example, one
branch for "the table has color red" and another for "the table does
NOT have color red").  [[It could be that the plans that involve the
table being blue, green, yellow, black and white would all be
equivalent, and where the plan actually diverges is the one that needs
the table to be red.  It is more efficient to let the other five plans
be combined into a single plan for the table NOT being red.  This is a
way of being "demand- driven" in how branching is introduced into the
search tree.  The "chance of success" at any branching is higher with
the two-way branching, and this type of branching represents a lower
level of commitment at this level in the search process.  (Of course,
which branch to look at first can involve consideration of the costs
of actions and which you think is more likely to be true in the world
- more on that in probabilistic planning!) And even if it turns out
that as much subsequent branching is needed in a series of two-way
branches as in the naive approach, at least the higher branching has
been pushed lower down into the search process.]]

The third way of handling incomplete information is to interleave sensing
and execution (i.e., do some sensing up front and some during execution).  
This method takes the idea that, as planning progresses, you will be able to 
figure out that some of the missing state variables are relevant.  At that 
point you find the values of these variables, and having found the values, you
can continue with the plan.

Both [Etzioni et. al.] and [Olawsky and Gini] focus on using sensing
actions to obtain the missing information.  [Etzioni et. al.] bring
these actions directly into the plan, and construct two-way
conditional plans based on the results of the sensing actions, whereas
[Olawsky and Gini] interleave planning with execution, suspending
planning until a needed piece of information is obtained through
sensing, then resuming planning.  More complete descriptions of the
papers follows, along with additional comments and background material
provided by Dr. Rao during the classes.

===============================================================================

"An Approach to Planning with Incomplete Information" by Etzioni et. al. 

When considering how to deal with incomplete information during planning, four 
major questions arise under two main categories:

I.  How to model sensing:
	1.  How to represent information-needs goals;
	2.  How to represent actions whose primary function is to obtain information;
	3.  How to choose between an action that SENSES whether some property is true 
	    and an action that CAUSES the property to become true; and

II. How to carry out sensing
	4.  whether classical planning algorithms can be extended to allow for 
	    information goals and sensing actions.

Sensing itself doesn't change the state of the world; it only changes
the internal state of the agent's knowledge about the world.  Thus, we
need to extend planning to allow for actions, effects and goals that
are stated in terms of information needs as well as causal
relationships.

Action effects such as "(color (table red))" are ambiguous in planning that
includes sensing actions:  is this effect saying that the action MAKES the
table red or that it determines that the color of the table IS ALREADY red?
There must be some differentiation of effects to provide for these different
meanings.  In addition, goals need to come in different forms: we need to be
able to represent goals of the form "Make block A red" as well as goals of
the form "Make block A have the same color as block B."  Painting both A and B
black would violate the intent of the second goal, which was to determine the
color of block B, then make block A have that same color.

Etzioni et. al. have created a language, UWL, and a partial-order planning 
algorithm, SENSp, that handles these issues.  They have chosen conditional
planning as the technique used to carry out planning that involves sensing.

UWL (University of Washington Language) is an extension of STRIPS.  The 
extensions include:

1.  Annotations on preconditions and postconditions to differentiate between 
    information-gathering and causation-based goals and actions;
2.  Run-time variables of the form !x (whose values will be provided during 
    execution by sensing actions);
3.  Conditional plan steps that use the run-time obtained information; and
4.  The concept of a truth-value of U (unknown) for a proposition, in additon to T 
    (true) and F (false).

*** ANNOTATIONS ***

POSTCONDITION annotations that Etzioni et. al. use are either "observe" or "cause."
(In general, "observe" effects are sometimes also represented as "knows(P)" in 
planning.)

Cause postconditions are of the form (cause (P . T)) or (cause (P . F)), where P 
stands for some proposition. These correspond to the types of postconditions 
we've been dealing with all semester - they are an operator's effects which 
change the world state.  They correspond to STRIPS' adds (first example above) 
and deletes (second example above).  Causal effects contain no run-time 
variables.

In the Etzioni paper, there is an implied corresponding observation that
results from every cause effect.  That is, if you cause something to be true,
it is assumed that you also KNOW that you did so (i.e, "your right hand knows
what your left hand is doing").  Dr. Rao mentioned that this might not
necessarily always be true in planners that deal with incomplete information.

Observe postconditions change the planner's state of information.  Through 
observational effects, the planner can determine the truth value of a 
proposition: (observe (P . !v)); or it can identify an object that has a 
particular property or determine a certain property of an object: 
(observe ((P !x) . T)).  

Observe effects use run-time variables, which are essentially free variables.
Since ordinary free variables don't have any semantics in logic, we give the
run-time variables "existential semantics" - that is, we make them Skolem
variables, which will take on some value as a function of the objects in the
domain.  Skolem variables are treated differently from ordinary variables:
Skolem variables can't unify with a constant, nor can they unify with each
other.  They are allowed to unify with ordinary variables.

Including existential-semantics variables corresponds to disjunctive effects, 
which we haven't seen up until now (UCPOP allows disjunctive preconditions but
not disjunctive effects), because they correspond to nondeterminism.  However,
the nondeterminism is with respect to the agent's knowledge state, NOT to the
world state.  The world is still deterministic.

Information obtained at run time through observational effects can be
used to construct a two-way conditional plan.  For example, from
(observe (P . !v)), the planner can build a plan conditional on the
value of the run-time variable !v with the construct (pcond P P1 P2)
(P is the same proposition P as in the observe effect, P1 and P2 are
plans) - if !v becomes T (i.e., proposition P is true) at execution
time, plan P1 will be executed; otherwise, plan P2 will be executed.

The second form of observe effect, (observe ((P !x) . T)), can be used
to build a conditional plan using the construct (vcond (!x = K) P1 P2)
- plan P1 will be executed if, during execution, the run-time variable
!x is bound to the constant K; otherwise, plan P2 will be executed.

By the way, a single action can have both cause and observe effects.

How cause and observe effects can be used depends on the rules relating to
the types of the goals (i.e., information needs or causal goals).  Etzioni et.
al. use annotations on preconditions to govern these rules.

PRECONDITION annotations are "satisfy," "find-out" and "hands-off."

Satisfy preconditons can be achieved through either causal or observational 
effects.  For example, the goal "(satisfy (color (table red)) . T)" can be
achieved either by painting it red ("(cause (color (table red)) . T)") or by
finding out that the table is already red ("(observe (color (table !c)) . T)"
and !c becomes bound to red).  The choice might depend on the cost of sensing
vs. the cost of causing the condition to be true.  If you're a planning agent,
then painting the table red is easier than trying to find out what the table's
color is.  Both plans (one with a sensing action to achieve the "satisfy"
precondition, one with a causing action) will be correct in the classical
sense.  These decisions can be backtracked over:  imagine that the causing
action is chosen - you decide to paint the table red - but one of its 
preconditions is that you have a brush, and you don't have a brush, nor is 
there any action that can give you a brush.  Now, instead of giving up and
saying the plan is dead, just backtrack over the decision to paint the table,
and instead find out through a sensing action if maybe the table is already 
red.

Find-out preconditions refer to information needed - we want to find
out whether some proposition is true without changing its state in the
process of this "finding-out."  Thus, a find-out goal can be achieved
either through an observe effect, or through a causal effect if the
action that gave the causal effect serves some other useful purpose in
the plan.  This "other useful purpose" can be achieving some satisfy
goal, or achieving some other find-out goal via an OBSERVE effect.

For example, suppose one of your goals is (informally) "Everything in
this room has been painted red," and another goal is "Find out if the
table in this room is red."  If nothing constrains the second goal to
be achieved before the first, then the action that supports the first
goal can be used to also support the second goal (I just painted
everything in the room red to support the first goal, so from that I
know that the table is red).  However, if that first goal weren't
there (i.e., if we only had "Find out if the table is red"), then we
could not directly achieve that goal by painting the table red and
then saying, "Yes, the table is red."

From this discussion we can glean the rule that we can do SIMPLE
establishment between a causal effect and a find-out goal, but we are
not allowed to do STEP ADDITION establishment from a causal effect to
a find-out goal.  (Of course, we are allowed to do step addition
establishment from an observe effect to a find-out goal!)

This relates to the idea of "filter conditions" that has been around
in planning for a long time.  Filter conditions are those conditions
that must hold before you can take an action, but you are not allowed
to make them true.  The example Dr. Rao gave was, suppose you have a
goal of going from Phoenix to San Francisco.  One action in such a
plan might be to take an airplane, but that makes sense only if there
is an airport.  Having an airport isn't a real precondition - you're
not allowed to build one from scratch just so you can take a plane.
Thus, having an airport is a "filter condition" - use this operator
(take a plane) WHEN there is an airport.  And if there IS an airport,
there are additional, ordinary preconditions (I must be at the
airport, I must have a ticket, etc.) that can be supported by other
actions (take a taxi, buy a ticket, etc.).

Filter conditions relate to achieving find-out goals through causal
effects - I'm not allowed to add a new action to support it, but if
it's already there (maybe as a "side-effect" of some earlier action),
I can use it.

When there are filter conditions, goal ordering matters.  Using the
two goals above (about the red room), suppose you pick the find-out
goal first, and your sensor is broken - then the plan fails.  However,
if you first painted the room red to achieve the other goal, then you
can achieve the find-out goal.  (This relates to the idea of Point
Truth Constraints, and the fact that these don't go onto the agenda.
Just because a PTC isn't satisfied, that doesn't mean it won't be
later, so instead of pruning it, just blacklist it.)


Hands-off preconditions (e.g., (hands-off P)) prevent the planner from
doing anything to change P's state.  (The SENSp algorithm implements
these by adding a causal link from the plan's initial state to its
goal state.)  Hand-off preconditions constitute goals of maintenance.

Because of these extensions to STRIPS, the authors have devised their
own definitions relating to the correctness of plans which take into
account the semantics of the annotations.  See definitions 2 through 6
in the full paper for the specific meanings of how two propositional
contents can "match," supported and valid preconditions, and correct
plans.


THE SENSp ALGORITHM and Conditional Planning in general

SENSp is an IMPLEMENTED partial-order planning algorithm for UWL that is based 
on SNLP but has several extensions to SNLP.  The authors claim it is provably 
sound, and complete as long as operators with observe postconditions have no 
preconditions (this eliminates subgoaling to achieve observational steps).

Some SENSp details:

How SENSp handles CAUSAL LINKS:

* Only cause postconditions are considered as possible threats to
  causal links. Observe postconditions pose no threat.

* As mentioned above, hands-off goals are implemented as a causal link
  from the plan's initial state to its goal state.

* Find-out goals are implemented as two branches, one for achieving it
  via an observe effect, and the other for using a cause effect.
  SENSp separately considers these two branches, backtracking to
  ensure completeness.  In the branch for using a cause effect,
  special rules are followed: 

"... the algorithm first tries to add causal links to support all
preconditions without find-out annotations (adding new steps to the
plan if necessary).  Then it tries to add links supporting all
remaining goals without adding new steps to the plan."

Run-time variables are considered as constants with as-yet-unknown
values; there are matching restrictions imposed on them as described
above for Skolem variables.  In addition, if a run-time variable
matches with an ordinary variable, an ordering constraint is added so
that the value of the run-time variable is not used before it is
observed.

SENSp will insert a conditional into a plan only when it needs to
constrain the value of a run-time variable, and it must do so in a way
that obeys the requirement that the run-time variable be observed
before being used.

The algorithm is a conditional planner.  It places sensing actions
into the plan and then plans for the different possible outcomes
(i.e., for the possible values of run-time variables) so that, when an
information gathering action is executed, the plan already "knows"
what the next action should be based on the results of the sensing
action.  The plan is essentially a tree structure, with branches below
the sensing actions.  A complete execution corresponds to a branch in
the tree plan.

In general, as mentioned earlier, instead of taking the naive approach
to conditional planning (which forms a K-way conditional branch), an
algorithm could form two-way branches based on the truth/falsity of
some condition.  This is a demand-driven technique that takes place as
part of the backward reasoning from the goal step.

To establish a conditional link in which some variable's value is
constrained to be a specific constant, we also must consider that the
variable's value might NOT be that constant, and planning must be done
with respect to that possibility.  A copy is made of the goal step,
and assign to each of the goal steps the corresponding CONTEXT - if in
one branch the conditional value is (color (table red)), then the goal
step in that branch has that context, whereas the goal step in the
other branch has the context (not (color (table red))).

In addition, the steps in a branch below a sensing action also have
the corresponding context (contexts are propagated forward in the
plan).  A step in the tree-structured plan will be executed as long as
the world state contains the context of that step - that is, the steps
context is a subset of the world state.  (Of course, any step whose
context is null will always be executed - this corresponds to steps in
the plan that occur before any sensing actions.)

Any two contexts are said to be compatible if one is a subset of the
other.  If they are not compatible, there is no possible way that both
actions can be executed in any legal execution of the plan.  This idea
is used to allow the planner to recognize that a step in one branch of
the plan can't violate a causal link in another branch.  This is
called "conditioning" of steps - we can avoid handling an interaction
between a causal link and a step if their contexts are incompatible.
In the example in Dr. Rao's overheads, there is a "remove(tire)"
action to achieve the "clear(hub)" precondition of "puton(spare)".
"Remove(tire)" threatens the causal link of the start step giving
"on(x)" to the goal step (which has the context "(status (x,
intact))"); this threat can be resolved by putting a conditional link
from the check(tire) action to the "remove(tire)" action with the
context "not (status (x, intact))".  Now the contexts are
incompatible, and the link is no longer threatened.

Also in the flat tire example, it's important to realize that the
planner could have first considered putting the spare on (as long as
there's no precondition that says you can only do this if the flat
tire is NOT intact).  If the planner had taken this approach, then the
resulting plan would not have been a conti- tional plan.  Also,
suppose there were no "check(tire)" action - then the problem can
still be solved by changing the tire, or by buying four new tires.
These are still correct plans; they just may not be the optimal ones.


PRACTICAL APPLICATION

The UNIX domain often gives rise to information goals and sensory
actions as well as actions that change the world state.  In fact, the
authors' development of UWL was originally motivated by a project of
building a softbot for UNIX.  The softbot uses UWL to represent its
actions (UNIX commands) and goals - both information-gathering and
causation goals (there are several examples in the paper).  The
authors point out that UWL as it stands does not address certain
issues that arise in representing UNIX commands, such as the need for
universally quantified preconditions and postconditions, or the fact
that other agents are continually changing the world's state.  The
authors are currently working to extend UWL to address these issues
(or at least were at the time of this paper!).

===============================================================================

"Deferred Planning and Sensor Use" by Duane Olawsky and Maria Gini

Here is an example of the second technique of handling informational
actions in planning: instead of forming conditional plans, sensing
actions in the plan are executed as soon as they are brought into the
plan.  Once the needed information is obtained, planning continues.
This is, in general, considered a reasonable thing to do, since
information-gathering actions aren't expected to drastically change
anything else.  Of course, in resource-limited planning, some amount
of time and/or cost is involved in executing the action, and this may
cause a violation of other time constraints on the planning process.
However, in general, information-gathering actions are certainly more
reasonable to execute before the plan is complete than other types of
actions.

This paper approaches the incomplete information problem from the
perspective of robotics.  The ability to incorporate sensory data into
the planning process is necessary for an autonomous robot to be able
to operate under general goals for extended periods of time, realizing
what information it needs and what it must find out through sensor
use.

The authors have chosen to use conventional planning techniques as a
basis for their study, and they examine two possible ways of using
these techniques to solve problems with incompletely specified initial
states: extending the techniques in some way, or defining new
operators at a level of abstraction that allows the conventional
planner to handle such problems.  They give an example of the second
method and show that it leads to nondeterministic effects, which is a
problem for conventional planners.  Thus they opt instead to extend
conventional planning techniques to account for incomplete
information.

The three basic ways of doing this are:
1) Plan for all contingencies (i.e., for all possible sensor reading values);
2) Find a single plan based on an assumed value of the sensor reading; or
3) Defer planning decisions that depend on a sensor reading until the reading is 
available.

The first approach is similar to the "K-way" conditional plan
mentioned in the introduction to this summary; it could easily result
in an exponential growth in the amount of planning required.  Even so,
this approach may be appropriate (if computationally feasible) in
cases where: a) The cost of the plan can be justified by its long-term
usefulness (there is a high expectation of reuse); b) Execution-time
planning (deferred or replanning) is impossible or undesirable due to
time constraints during execution; and/or c) Plan errors are highly
critical, so that the cost of extra planning is outweighed by the cost
of a mistake.

In the second approach, it is obvious that if the wrong assumption was
made regarding the sensor reading's value, then the plan is invalid,
and replanning must take place.  Thus, this approach is most
appropriate only when all of the following are true: a) It is
acceptable for the robot to stop during execution to allow for
replanning; b) Plan errors are NOT highly critical - actions are
reversible, or the cost of failure is small.  c) Some specific sensor
reading value is more likely to occur than any other.

In the third approach, the planner only completes those portions of
the plan for which it has enough information.  This may lead to
important dependencies or constraints being missed, which in turn may
lead to an action taken that must be undone.  This approach is
appropriate when (a) and (b) mentioned for the second approach are
true.

The authors have developed an agenda-controlled planner called BUMP
(Basic University of Minnesota Planner) that uses STRIPS-style
operators and builds a plan consisting of goal nodes and process
nodes.  BUMP works under the direction of an Execution Controller (EC)
that invokes BUMP when planning is called for, then simulates
execution of the steps of the (possibly partial) plan that BUMP
returns.  Thus, if a piece of information is missing, the BUMP plan
must contain a request for a sensor reading via a sensor-process node.
The EC interprets this node as an instruction to solicit (via the
robot) a sensor reading at this point in the execution.  As soon as
the sensor reading takes place, BUMP is restarted with the new
information, makes additional plan decisions and, when all goals in
the plan have been either achieved or deferred, returns a new plan to
the EC.

At this point the EC begins executing the (possibly partial) plan,
preferring sensor processes over other parallel processes.  In turn,
the EC will then first choose to execute actions that are "set-up"
actions to sensor processes or that are constrained to be ordered
before sensor processes.

This process (planning, executing, sensing, planning, executing,
sensing, etc.)  continues until all necessary sensing is done and BUMP
returns a complete plan.  The EC then finishes the execution of it.

Sensor processes are represented essentially the same as other
processes; they have preconditions and an add and delete list.  The
preconditions are used to generate the "set-up" actions for the
sensor, and at least one of the add list items will be the information
(reading) obtained by the sensor.  Sensor processes are added to the
plan ONLY when the needed information is not already available in the
plan (i.e., if it hasn't already been asserted in the initial state or
by some process node that can occur before the goal node that needs
the information).

In deciding what types of goals to defer, the authors came up with two
basic strategies, plus a third strategy that is a domain-specific
variation of one of the two basic strategies.  The basic strategies
are:

1) CONTINUE ELSEWHERE: This method defers only those goals that are
   defined in
terms of data that must be obtained through a sensor reading.  This
approach preplans as much as possible before returning the current
plan to the EC.  2) STOP AND EXECUTE: As soon as BUMP reaches a goal
defined in terms of a sensor reading, it stops immediately, deferring
ALL remaining goals until the sensor reading is obtained, and returns
the current plan to the EC.

The authors' study involved a set of 32 problems from a "two-box"
version of the toolbox domain: there are two toolboxes, two wrenches
and two bolts of different sizes, each of which matches the size of
one of the wrenches.  The goal is to bolt one or both of the toolboxes
shut with particular bolts.

In this study, the Stop and Execute strategy failed on only two of the
32 cases, performing better than the Continue Elsewhere method, which
failed on four cases.  They also tested their third, domain-specific,
strategy (based on Continue Elsewhere), which they call "Sense before
Closing."  This strategy attempts to order all sensor processes before
all "Close-Toolbox" processes.  Sense before Closing performed as well
as Stop and Execute.

The authors also describe a new project in which they extend the test
problem to a three-toolbox version.  Preliminary results indicate that
the Continue Elsewhere method will outperform the Stop and Execute
method in more complex problems, due to Stop and Execute's inability
to plan more than one sensor process ahead.  Also, indications are
that neither of the general-purpose strategies are very good at
avoiding failures in complex problems, and more domain-specific
strategies (such as Sense before Closing) may be necessary.

By the way, both of these papers present interesting discussions of
and pointers to related work in these areas.

==========================================================================

We have just scratched the surface of planning with incomplete information.
Some other issues involved in sensing are listed below:

Since sensing can be costly and take time, it is important to try to reduce 
the overall cost of sensing during planning.  Don't sense redundantly - if
an action with an observe effect was executed earlier, can you use the results
of that action to establish a find-out goal (rather than resensing)?  

On the other hand, it may be that the cost of finding out what you do know 
(e.g., searching a database to determine what has been observed) is higher 
than the cost of resensing; likewise, things may have changed since the last 
sensing, and resensing would be necessary anyway.

You should keep track of those aspects of the world about which you have
locally closed-world information; in those aspects of the world, you don't
need to do sensing actions - a universally quantified precondition can be
supported by a universally quantified effect.  Here again, though, you can
lose the closed-world through some other action, and may need to resense
or do a "promotion or demotion" on the other action to regain closed-world
information.