Controlling Non-backtrackable Decisions in UCPOP
--Role of Constraint propagation in Planning

Notes by Rao Kambhampati
ASU CSE TR XXX
[Oct 26, 1995]

Generally, UCPOP has too many decision  points. These include 

1. Which open condition to work on
   1.1 Which establishment choice to use for the chosen open cond?
   
2. Which conflict (unsafe link) to work on
   2.1. Which resolution to use for the chosen conflict?
   
   
Of these, 1 and 2 are non-backtrackable decisions while 1.1 and 2.1 are 
backtrackable decisions. We will talk about controlling backtrackable 
decisions when we consider EBL stuff next week. Presently, we will 
consider 1 and 2, which are non-backtrackable (and thus cannot be 
controlled through EBL).

Although 1 and 2 need not be backtracked, they do effect the size of the 
search tree. In particular, choosing the right goal order and right 
conflict resolution order can have a significant impact on the size of 
the search tree generated by the refinement search. 

Lessons from CSP
----------------

Before we look at controlling UCPOP's non-backtrackable decisions, it is 
worth noting the similarity between UCPOP and CSP, and digest the lessons 
from CSP. We note that 1 and 2 above are similar to the decision of "which 
variable to work on next" in csp, and 1.1 and 2.1 are similar to the 
decision of "which value of the selected variable to try next". Thus, 
anything we learned about which variable to choose next could be gainfully 
transferred to UCPOP decisions 1  and 2. In particular, remember that in 
CSP, we choose the variable with the smallest live domain first. This will 
imply that in UCPOP we should be trying the goal or conflict with the 
smallest number of resolutions first. Both these ideas are used in a 
strategy called Least Cost Flaw Refinement strategy. 

Of course, the effectiveness of the variable selection strategy in CSP 
depends on our ability to accurately estimate the current LIVE domain of 
the variable. In particular, suppose we have a bunch of variables, all of 
which have domains of size 100 to begin with. Suppose, after a bunch of 
refinements, we find that one of the remaining variables has an effective 
domain size of only 25. If we take time to realize this point, we can then 
actually choose that variable as the next variable. If we dont recognize 
this, we will consider the live domains of all variables to be 100, and 
thus, we won't really be reaping the benifit of reducing branching factor 
at higher levels of the tree. In CSP, the way you get a better estimate of 
the live domain is by doing constraint propagation and enforcing 
k-consistency. For example, 1-consistency weeds out values from the domain 
of the variable that are no longer feasible for that variable, and 
2-consistency weeds out from the domain of a variable any value for which 
we will not be able to find a legal value for another remaining variable. 
In general, the higher the consistency that we enforce, the better our 
estimate of the live domain of the variable is and the better our ability 
to exploit LCFR. With better estimates, we get several advantages. 

1. We can detect hidden inconsistencies. Suppose some variables domain 
becomes empty, then we know that there is no solution in that branch, and 
we can stop right away

2. By having a better estimate of the live domain, we can make the 
refinements more informed

2.1 we can then make a more informed choice as to which should be the next 
variable to work on (i.e, which refinement to use next)

2.2 we can have a better idea of which branch of the chosen refinements 
are likely to contain solutions. For example, suppose the actual domain 
of the chosen variable has the value a, and live domain of that variable 
doesn't contain that value, then our refinement on that variable can 
avoid a branch corresponding to the value of the variable being a. 

To get the most accurate estimate of the live domains of the variables, we 
should be enforcing N-consistency. When we do that, we can be sure that any 
remaining values of the varibles are such that any combination of those 
values will be a solution. Thus, search is completely obviated. However, 
higher consistency checks cost more. 1-consistency can be done in O(n) 
time (where n is the number of variables) while 2-consistency takes O(n^2) 
and 3-consistency takes O(n^5). 

In general, the experience in CSP teaches us that a bit of constraint 
propagation can help, but too much of it doesn't necessarily help. 
Intuitively, constraint propagation does _global_ tidying up of the 
search space, while refinement looks at a specific corner of the search 
space more carefully.  A bit of tidying up helps in making sure that you 
are looking in corners that actually contain solutions. But, too much 
tydying up may be counter-productive in that for getting one solution, we 
don't need to completely tidy up the whole search space. Technically, in 
CSP community, it is found that for most problems, the break-even point 
occurs around 2-consistency or 3-consistency. The higher level 
consistencies are two costly to enforce, while the lower level 
consistencies are not as good at tidying up the search space enough to 
have any effect on the search performance.

What is perhaps more important, there is in fact a bi-directional synergy 
between constraint propagation and refinement. In particular, suppose we 
are using 2-consistency as our preferred constraint propagation, and are 
trying to solve a queens problem. At the beginning of the problem, 
2-consisistency fails to bring about any tightening of the domains of the 
variables, since for every positioning of every queen, there will be at 
least one legal positioning of any other single arbitrary queen. However, 
as we do some refinement, committing to a specific value for some queen 
in each branch, then under each branch, 2-consistency can effectively 
propagate constraints and tighten domains. In particular, in the branch 
where queen 1 is kept in position 1, we know that queen 2 cannot go into 
position 1 or 2, and q3 can't go to position 1 or 3 and so on. These 
values will thus be weeded out by 2-consistency. 
 Let us keep these lessons in 
mind when we consider the various control strategies that were tried for 
non-backtrackable decisions in UCPOP.

Back to Non-backtrackable decisions in UCPOP
--------------------------------------------

In UCPOP, the role of CSP variables is played by the flaws, and the role of 
CSP variable values is played by the resolution possibilities for the 
flaws.  From the discussion above, we are primed to agree with the idea that 
the right way to control non-backtrackable decisions in UCPOP will also 
involve keeping track of the resolution possibilities of the flaws in 
UCPOP, and do constraint propagation on them to tighten the live domains 
and constraints. This however was not realized at first in the literature 
(even now it is not clearly articulated anywhere other than these notes!). 
What happened instead was that researchers tried a bunch of ideas which 
indirectly pushed them closer and closer to the realization that they 
should really be doing constraint propagation on the live domains 
(resoution possibilities) of the flaws (variables). 

Given that we are interested in controlling goal selection and conflict 
selection decisions, the first papers, eg. those by Smith and Peot, 
concentrated on conflict selection decisions. Original work concentrated 
on "postponement" of conflicts. Given that conflict 
resolution is an optional step in Refine-Plan, we know that we don't have 
to do conflict resolution at all and still will be able to get a solution 
for the planning problem (Qn. What is the analogue for this in CSP?) 
Thus, if we postpone considering conflicts, or even ignore them all 
together, we can reduce the branching factor due to conflict resolution. 

We have two extreme solutions-- Ignore all conflicts and delay them to 
the end  or handle each conflict as soon as it is created (as a 
historical side note, the SNLP algorithm given by McAllester had for some 
obscure reasons, the conflict resolution step before the open condition 
establishment step. Given the enormous impact thate clarity of that paper 
had on planning community, it was not surprising that for a long time it 
was assumed that the order -- conflict resolution first and open condition 
resolution next -- is somehow essential). 

Both extremes have their disadvantages. In particular, eager conflict 
resolution increases the branching factor up front, but then it can 
reduce the cost of consistency check, and thus help planner avoid 
refining inconsistent plans. Perhaps more important, it also allows the 
other refinements to be more informed. For example, we can avoid certain 
simple establishment possibilities that won't work. The only time such 
eager conflict resolution is going to improve efficiency is if the search 
reductions through its advantages outweigh the search increase through 
the increased branching factor. In face, in the paper
on Refinement Search that I co-authored with Knoblock and Yang, we found 
that eager tractability refinements (of which conflict resolutions are a 
subclass), are usefull only when they wind up indirectly reducing the 
establishment branching factor. Since the overall branching factor of 
UCPOP is the branching factor at the establishment stage (b_e) times the 
branching factor at the tractability refinement stage (b_t), the product can go 
down if increase in b_t leads to a significant decrease in b_e. Our 
experiments confirm this as the main reason for utility of tractability 
refinements. But, of course, this is a rare event. In general, eager 
resolution leads to performance degradation.


Similarly, the extreme lazy resolution of conflicts, while reducing b_t 
(and making it 1 in the extreme), will increase b_e since it (a) does not 
recognize inconsistent plans as easily (unless it uses an NP-hard 
consistency check and (b) may allow spurious establishment possibilities 
(see the example I gave during the CSP class). By far, (b) turns out to 
be a bigger disadvantage since (a) can be avoided by using NP-hard 
consistency check, but that will still not avoid b.

Since both extremes have disadvantages (predictably so), we would like to 
consider other types of conflict resolution strategies. There are two broad 
middle grounds. One is to delay conflicts that will be provably resolvable 
at some later point when they are picked up. This means that the delayed 
conflicts will not cause implict inconsistencies in the plan (although 
they can still allow spurious establishment possibilities). This idea is 
used in D-Sep (delay separable threats), and another strategy due to 
Smith and Peot which analyzes operator graph to recognize conflicts that 
can be provably delayed indefinitely. 

Although the above ideas _delay_ conflicts without causing inconsistency, 
they still allow spurious establishments to take place. In particular, 
suppose there is a conflict that has only one possible resolution -- say 
promotion. Since all the conflicts in the plan must eventually get 
resolved, this means that if the plan is to be a solution, it _must_ have 
that promotion ordering. Noticing this, and resolving the conflict will 
then allow us to cut down certain spurious establishment possibilities, 
which will be taken if we didn;t notice it. More over, we are doing this 
resolution is a no-loose situation since the search space size will not 
increase given that the conflict only has one possible resolution. This 
idea --  which essentially suggests that we should not be postponing a 
FORCED conflict (ie a conflict with only one resolution), is used in 
D-UNF strategy of Peot and Smith and the Z-lifo strategy of Schubert and 
Gervini.

Note that in checking if a conflict is forced, we should in general 
consider the current state of the plan. In particular, suppose a conflict 
c is introduced in a plan P and had two resolutions. Latter, P is refined 
into P' and due to the additional orderings among steps, c may now only 
have one resolution. Thus, to figure out if a conflict is forced, we need 
to dynamically compute its resolution possibilities. This is like 
computing live domain of a variable in CSP. Since the computation is 
often done considering each conflict in isolation, the more precise 
analogy is to ensuring 1-consistency. Specifically, seeing conflicts as 
varaibles, and their resolutions possibilities as their domains, 
D-Unf and Z-lifo etc. remove values from the domain that are not even 
feasible with the 1-ary constrains on that conflict (imposed by current 
partial plan). 

Until this point we have only been considering conflict flaws, and did not 
talk about open condition flaws. Unlike conflict flaws, open-condition 
flaws cannot in general be postponed indefinitely without loss of 
completeness. If we want to transfer our current knoweledge of conflict 
resolution to open-condition flaw resolution also, we might say that we 
should just be picking a flaw that has the smallest number of remaining 
live resolution possibilities, and resolve it. This is the LCFR -or least 
cost flaw refinement-- idea. Notice that computing the number of resolution 
possibilities for each flaw can be very costly, as in the worst case, we 
have to actually simulate the resolution of that flaw on that plan. In 
particular, open conditions typically have many resolution possibilities 
and enumerating them all can be expensive. Operator graph structures 
described by Smith (in the Rockwel TR) provide a way of statically 
computing estimates of the # of resolution possibilities for each open 
condition. 

We have remarked above that D-UNF, Z-LIFO and consequently LCFR, can be 
considered as enforcing 1-consistency on the domains of the conflicts. From 
CSP analogy, we know tha 1-consistency is typically not strong enough to 
tighten the problem sufficiently. In particular, even though the live 
domains of two conflicts after 1-constency enforcement are both 2 sized, 
there may actually not be a way of resolving both the conflicts at the same 
time. Thus, the plan may be dead and we don't realize this, and continue 
exploring spurious establishment possibilities. Enforcing 2-consistency 
will allow us to take care of this. This makes us realize, slowly, what 
should have been obvious from the beginning -- that the D-UNF, Z-lifo, LCFR 
etc are essentially steps towards doing constraint propagation along with 
refinement. 

Now that we see the utility of constraint propagation, it behooves us to 
put it in the foreground (rather than do constraint propagation in the 
background).  In other words, it will be useful for us to model the
flaws as variables and the resolution possibilities as the values, 
and be able to post arbitrary constraints on variables.  

If we are really only interested in conflict flaws, there is a very easy 
way of doing this -- which is to convert each conflict into a disjunctive 
ordering constraint, and do constraint propagation on the disjunctive 
ordering constraints. This is the stuff I talked to you during the CSP 
lecture. 

Of course, the disjunctive orderings based model is not sufficient once 
we consider open condition flaws, or even conflict flaws that have 
confrontation and separation as resolution possibilities. 

In this case, an explicit modeling of the conflicts, and their 
resolutions, as described in Joslin and Pollack' paper would be required. 

Dynamic Constraint Satisfaction Problems:

When goal flaws are modeled as csp variables, or when conflict flaws
allow confrontation, the resulting csp problem is no longer the
standard csp problem. In particular, choosing a specific establishment
possibility for a goal would may in turn introduce new goals (open
conditions) into the plan. There is a special type of CSP problems
called Dynamic CSP problems which can be used to model this to some
extent. In particular, in a Dynamic CSP, we have some N variables with
their domains. Each variable can have an active/inactive status.  A
subset of these N variables, say, n, is active in the beginning, and
the rest of the variables are inactive. The CSP problem is solved when
all the currently active variables have values, and the assignment
does not violate any constraints among the active variables. 

The twist is that when you assign a specific value to a currently
active variable, that might, as a side effect, make a bunch of
currently inactive variables active. This feature can essentially
model the goal/subgoal variables in plannings. 

The one difference is that strictly speaking, in planning, we cannot
really put a limit on the number of total goal variables, --active and
inactive combined. This is because, infinite subgoaling is possible. 

Anyway, it is worth noting that dynamic constraint satisfaction is no
where near as well studied as ordinary (static) CSP is. The typical
methods are iterative and to try and solve the static CSP problem
involing the currently active variables, and whenever we find that new
variables must become active to solve the problem, we introduce a
minimal number of new variables, and try to solve the new version
again as a static CSP. As far as I can understand (from Joslin's paper
as well as his talk), he uses similar methods to solve the dynamic
constraint satisfaction problem that corresponds to a planning
problem. 

While the full dynamic constraint satisfaction problem is necessary
for modeling goal flaws, it is not clear to me whether goal flaws
really do provide much constraint propagation. At the very least, the
results in Joslin's current paper don't convince me of this.

Looking from CSP point of view, we can think of dynamic CSP as
introducing subgoaling behavior into CSP. Charles Petrie has papers in
AAAI 92 and AAAI 91 where he proposes a different formulation called
"CDPs" or constrained decision problems, to model situations that have
aspects of both CSP and Subgoaling.  Worth taking a look at.

Another connection is the whole area of constraint logic
programming. Logic programming (e.g. prolog) essentially involves
repeated subgoaling. More recently, people have been looking at
variants of logic programming which allow first class status for
constraints within logic programming. The progress in that area also
may help us provide a formulation of planning that allows both
refinement and constraint propagation. 

Summary:

 In summary, constraint propagation, which focuses search on
non-backtrackable variable selection choices in CSP can also be quite
useful in controlling non-backtrackable choices in planning. All the
tradeoffs that hold between propagation vs. refinement in csp also
hold in planning. 

			      ----------