NOTES ON IMPROVING PLANNING EFFICIENCY

    Prepared by: Robert H. Guttman
    Revised by: Rao Kambhampati [Oct 30, 1995]

Note: Read Rao's notes on Controlling non-backtrackable decision
      before reading these. 

PAPERS COVERED

    David Joslin and Martha E. Pollack.  "Passive and Active Decision
        Postponement in Plan Generation."  In European Workshop on
        Planning (EWSP), Assisi, Italy, 1995.
    

    M.A. Peot and D.E. Smith.  "Threat-Removal Strategies for Nonlinear
        Planning."  In Proc. Eleventh AAAI, 1993.


OVERVIEW

As we learned in earlier classes, there are two critical decision points
during partial order planning (POP) processing: (1) goal ordering and (2)
threat resolution strategy.  Unfortunately, the POP planners we have
studied up to this point have been very sensitive to the choice at both of
these decision points.  Specifically, the choice determines how many nodes
are visited during the POP process, many of which can be dead-end nodes.

Peot and Smith's paper address one of the two decision points presented
above, (2) threat resolution strategies (TRSs).  Joslin also discusses an
approach to (2) called LCFR that he initially introduced in an earlier
paper.  The following threat resolution strategies were discussed in my
presentation:

    DSEP    This TRS postpones conflicts that are not "real".  A conflict
            is real if it's variable bindings indicate that there is a
            conflict requiring demotion, promotion, or another TRS.  As
            you may recall from earlier classes, DSEP's lazy strategy was
            deemed better, in general, than a less lazy approach (i.e.,
            separation).
    

    DUNF/   This TRS postpones conflicts that are not "forced".  A
    ZLIFO   forced conflict is a conflict that can only be resolved
            in one way (because all of its other TRSs have already
            been eliminated).  In other words, it's branching factor
            (BF) is one.  Choosing a conflict that really has no choice
            makes intuitive sense.  If these forced conflicts existed
            in our plan-space and we didn't choose them immediately,
            then we would need to repeatedly choose them in each
            successive search branch - a rather inefficient strategy.
    

    LCFR    Joslin's LCFR strategy is a slight generalization of the
            DUNF/ZLIFO strategy discussed above.  Instead of postponing
            non-forced conflicts (this with a branching factor greater
            than one), LCFR postpones conflicts that don't have the
            least BF at that node.  For example, if we had three
            conflicts with BFs of 2, 3, and 4, then LCFR would postpone
            the conflicts with BFs of 3 and 4 and choose to work on
            the conflict with a BF of 2.
    

    Full Eager   Resolve each conflict as soon as it occurs

    
    Full Lazy    postpone all conflict resolution to the end.      

The above threat resolution strategies (TRSs) fall somewhere in the
spectrum of available TRSs in partial order planning (POP). 

The above TRSs can be considered as wrestling with a tension between
the branching factor and the per-node cost, both of which we would
generally like to reduce.  However, when we reduce the branching
factor, we also (ideally) want the plan represented at each node to be
consistent.  To *guarantee* this, we would need to commit to check its
consistency, an expensive per-node cost (see the "Extreme" TRS above).
Seeing that it is too costly to guarantee consistency, we often settle
for an approximation by postponing splitting the plan into multiple
branches and let successor nodes eventually determine if the plan is
consistent.  The DUNF/ZLIFO and LCFR TRSs above attempt to improve
upon this by more intelligently choosing which conflicts to postpone
and which to work on.  This tends to push the BF lower in the search
tree, which we have learned is a "good thing".  Why?  Because, in
general, the plan is more constrained in the lower levels of the
search tree and, therefore, dead-end nodes can be detected more
readily.

This tension between BF and per-node cost is one of the
characteristics that differentiate the TRSs above.  Notice that in
DSEP, the per-node cost is quite low and the BF relatively high.  In
LCFR, we can see that the BF is reduced (by only postponing conflicts
that don't have the least BF at that node), but that the per-node cost
is higher than DSEP.  This cost is attributed to determining the BF of
each node's conflicts.  (For example, it takes time to determine if a
given conflict can be resolved via demotion, promotion, or separation
which itself can be represented by several branches, one for each
possible bindings of its variables.)  Joslin's paper shows that LCFR
is more efficient than standard UCPOP.  However, I was told that LCFR
exhibits roughly the same efficiency as DUNF/ZLIFO.  Even though LCFR
tends to reduce the BF more so than DUNF/ZLIFO, it is made less
efficient overall due to its higher per-node cost of determining the
BF of conflicts.


PASSIVE POSTPONEMENT

Joslin's paper focuses on one of the efficiency problems of partial-order
causal link (POCL) planners, namely "passive postponement".  Passive
postponement is an attribute of a search strategy that chooses which
conflict to work on *without* considering how other constraints may
influence the consistency of the plan.  Specifically, passive postponement
can result in the propagation of "dead-end" nodes during threat resolution.

Joslin has shown in his paper that an exorbitant amount of time can be
wasted by continually traversing such dead-end branches.  This wasted
time can be so large that a solution to the problem can not be found
in a given amount of time, thus resulting in failure.  Figure 2 of
Joslin's paper shows that up to 90% of the nodes examined in a given
problem were dead-end nodes on successful problems and up to 98% of
the nodes examined were dead-end nodes on unsuccessful problems.


ACTIVE POSTPONEMENT

Active postponement is an attribute of a search strategy that chooses
which conflict to work on *while* considering how other constraints
may influence the consistency of the plan.  In POP, we know that as we
traverse down each branch in the plan-space, we have a very localized
view of the problem at each node.  We also know from our recent
constraint satisfaction problem (CSP) studies that techniques exist
that provide a more global view of the problem.  Joslin's paper
presents an approach to planning that allows for this more global view
by incorporating CSP techniques into the planning process.  It is this
global view of the problem that that allows for active postponement.

Before we explore Joslin's approach, we can also present active
postponement using the POP model (as was discussed by Rao during the
constraint propagation class).  For example, instead of branching upon
a conflict (such as creating one branch for demotion and another for
promotion), we could traverse down only one branch with a disjunctive
ordering on the conflict resolution possibilities (such as demote *or*
promote).  Then, we can use these disjunctive orderings to propagate
constraints at each node much like in the CSP problems we have
studied. This approach would allow for each node of the branch to have
a more global view of the constraints on the problem and can therefore
make much more intelligent decisions on how to split the plan-space
(such as being able to prune dead-end nodes much more readily).

Joslin takes a different tact to provide active postponement.
Instead, Joslin provides a framework where the planning can be viewed
as either a planning problem (using traditional POP techniques) or as
a CSP (using known CSP techniques).  These views are related by the
Temporal World Model (TWM)'s formalization being mapped to CSP
variables, domains, and constraints and then sharing these variables
with the planning component.  Joslin's system, called Descartes,
allows for the use of only planning techniques, only CSP techniques,
or any strategy in between.  Whenever the CSP problem is solved, its
variable bindings result in a plan that is a solution and whenever
plan is solved, its variable bindings result in a CSP that is a
solution.

As illustrated in his paper, the benefit of Joslin's approach is the
pruning of dead-end nodes via active postponement.  However, Joslin's
approach also benefits the planning process in other ways.  In
addition to pruning dead-end nodes, the general purpose CSP engine
also tightens the plan's current constraints (such as further limiting
the domains of variables). This has the effect of reducing future
branching by variables being allowed to take on fewer bindings.  In
addition, Joslin's approach allows the CSP engine to work on "flaws"
where a flaw is defined as either a goal achievement or a threat
resolution.  This allows Descartes to consider goal orderings within
the CSP engine besides just threat resolution conflicts alone.  _It was
not discussed_ in his paper how much of Descartes' efficiency
improvements resulted from the managing of goal achievement (as
opposed to threat resolution conflicts).

It is worth discussing Joslin's implementation further.  For example,
we know from our POP techniques that goals can be achieved by either
simple establishment of operators that already exist somewhere in the
plan or by step addition where we introduce a new operator into the
plan.  We also know from the discussion above, that Descartes maps
planning elements (Temporal Qualified Assertions (TQAs) in this case)
onto CSP variables, domains, and constraints.  However, in the CSP
techniques we have learned up until this point, we always deal with a
static number of variables.  A question then is, how does Descartes
deal with new operators introduced by step addition?

The answer is that Descartes CSP component works as a dynamic CSP
engine.  This allows for the introduction of new variables into the
problem which in turn will change the constraints of the problem.
Dynamic CSP problems can be thought of as having a huge number of
variables, one for each possible planning element as needed.
Initially, only a small subset of the variables will be considered
"active" by the CSP engine and only active variables will be
considered.  As CSP processing continues, some inactive variables will
become active and need to be considered by the CSP engine.

However, as we know from our POP studies, the number of steps
introduced into a plan, and thus the number of subgoals (preconditions
of the steps) can grow without bound.  Seeing that we would need to
represent these somehow as active and inactive variables in our CSP
problem, we find ourselves in trouble.  Obviously, this view is
inadequate for representing plan elements as CSP variables.

In terms of how Descartes' CSP engine works, I will provide a brief
explanation.  As stated earlier, the CSP engine treats threat
resolution and goal achievement as "flaws" that get worked on.
However, the CSP engine deals with these flaws slightly differently.
For threat resolution flaws, a variable Dt has the domain {a,b} for
example as follows:

    (Dt 3D a) -> after( goto(tile1), fill(hole1) )
    (Dt 3D b) -> before( goto(tile1), goto(hole1) )
    

If it is learned that after( goto(tile1), fill(hole1) ) cannot be
satisfied , then the CSP engine deduces Dt !3D a and Dt 3D b.  If the
threat cannot b e satisfied, then the CSP engine deduces that Dt can
not be satisfied given its domain and that the current node is a
dead-end.

For goal achievement flaws, the same approach applies, but new boolean
variables G and J for example are introduced representing *tentative*
step instantiations (since it is not yet known if either step will
actually be in the final plan).  If it is learned that a constraint
that has J as a conjunct cannot be satisfied, then the CSP engine may
deduce J 3D FALSE an d G 3D TRUE.

As discussed during my presentation, preliminary and experimental
results show that Joslin's approach to improving planning efficiency
is superior to most of the other approaches we have looked at.  One
example illustrates UCPOP not being able to solve a problem after 24
hours of processing whereas Descartes was able to solve it in under
three seconds.


DUAL-REPRESNTATION PLANNING

By representing a planning problem from both planning and CSP points of
view, techniques from both fields can be applied.  This can only be
beneficial as is evidenced by Descartes' preliminary results.  I explained
earlier that Descartes achieves this dual-representation by mapping
planning elements onto CSP variables, domains, and constraints.  I will
briefly explain this further now.

Descartes uses Allen and Koomen's Temporal World Model (TWM) approach
to formalize planning problems where each proposition has an
associated temporal interval where it is considered to be TRUE.  These
temporal intervals are known as Temporal Qualified Assertions (TQAs).
For example, let's say we have two operators P(x)[ t1, t2 ] and
ACP(y)[ t3, t4 ] in our planning problem meaning that P(x) is TRUE
between temporal interval t1 and t2 and ACP(y) is TRUE between
temporal interval t3 and t4.  In mapping the se to CSP variables C1
and C2, we have:

    C1 -> P(x)[ t1, t2 ]
    C2 -> ACP(y)[ t3, t4 ]

To resolve the threat introduced when x and y are bound to the same
value, we need to add the constraint that C1 cannot unify with C2.
This is saying that P(x) cannot be TRUE over the same temporal
interval when ACP(y) is TRUE.  This is similar to the demotion,
promotion, and separation that we've seen in POP.

By representing plans and CSPs in this way, Descartes is able to share
variables between the two views.  As stated earlier, this means that
whenever the CSP problem is solved, its variable bindings result in a plan
that is a solution and whenever plan is solved, its variable bindings
result in a CSP that is a solution.


Some points:

 -- Does one need TWM model to introduce constraint propagation into
    planning (clearly not-- see disjunctive ordering constrains
    discussion)

 -- Has there been any empirical evidence that modeling open
    conditions as CSP variables helps in search?

 -- The details of the control strategy used by Descartes are very
    hard to glean from this short paper. In particular, does Joslin
    really solve a dynamic csp, or does he solve static csp's entailed
    by the dynamic csp?

 -- It would be interesting to understand the relation between
    Petrie's Constrained decision problems, which start from
    constraint satisfaction and add subgoaling behavior, and Descartes
    which starts from planning and converts it into a CSP. 


				 ---