Controlling Non-backtrackable Decisions in UCPOP --Role of Constraint propagation in Planning Notes by Rao Kambhampati ASU CSE TR XXX [Oct 26, 1995] Generally, UCPOP has too many decision points. These include 1. Which open condition to work on 1.1 Which establishment choice to use for the chosen open cond? 2. Which conflict (unsafe link) to work on 2.1. Which resolution to use for the chosen conflict? Of these, 1 and 2 are non-backtrackable decisions while 1.1 and 2.1 are backtrackable decisions. We will talk about controlling backtrackable decisions when we consider EBL stuff next week. Presently, we will consider 1 and 2, which are non-backtrackable (and thus cannot be controlled through EBL). Although 1 and 2 need not be backtracked, they do effect the size of the search tree. In particular, choosing the right goal order and right conflict resolution order can have a significant impact on the size of the search tree generated by the refinement search. Lessons from CSP ---------------- Before we look at controlling UCPOP's non-backtrackable decisions, it is worth noting the similarity between UCPOP and CSP, and digest the lessons from CSP. We note that 1 and 2 above are similar to the decision of "which variable to work on next" in csp, and 1.1 and 2.1 are similar to the decision of "which value of the selected variable to try next". Thus, anything we learned about which variable to choose next could be gainfully transferred to UCPOP decisions 1 and 2. In particular, remember that in CSP, we choose the variable with the smallest live domain first. This will imply that in UCPOP we should be trying the goal or conflict with the smallest number of resolutions first. Both these ideas are used in a strategy called Least Cost Flaw Refinement strategy. Of course, the effectiveness of the variable selection strategy in CSP depends on our ability to accurately estimate the current LIVE domain of the variable. In particular, suppose we have a bunch of variables, all of which have domains of size 100 to begin with. Suppose, after a bunch of refinements, we find that one of the remaining variables has an effective domain size of only 25. If we take time to realize this point, we can then actually choose that variable as the next variable. If we dont recognize this, we will consider the live domains of all variables to be 100, and thus, we won't really be reaping the benifit of reducing branching factor at higher levels of the tree. In CSP, the way you get a better estimate of the live domain is by doing constraint propagation and enforcing k-consistency. For example, 1-consistency weeds out values from the domain of the variable that are no longer feasible for that variable, and 2-consistency weeds out from the domain of a variable any value for which we will not be able to find a legal value for another remaining variable. In general, the higher the consistency that we enforce, the better our estimate of the live domain of the variable is and the better our ability to exploit LCFR. With better estimates, we get several advantages. 1. We can detect hidden inconsistencies. Suppose some variables domain becomes empty, then we know that there is no solution in that branch, and we can stop right away 2. By having a better estimate of the live domain, we can make the refinements more informed 2.1 we can then make a more informed choice as to which should be the next variable to work on (i.e, which refinement to use next) 2.2 we can have a better idea of which branch of the chosen refinements are likely to contain solutions. For example, suppose the actual domain of the chosen variable has the value a, and live domain of that variable doesn't contain that value, then our refinement on that variable can avoid a branch corresponding to the value of the variable being a. To get the most accurate estimate of the live domains of the variables, we should be enforcing N-consistency. When we do that, we can be sure that any remaining values of the varibles are such that any combination of those values will be a solution. Thus, search is completely obviated. However, higher consistency checks cost more. 1-consistency can be done in O(n) time (where n is the number of variables) while 2-consistency takes O(n^2) and 3-consistency takes O(n^5). In general, the experience in CSP teaches us that a bit of constraint propagation can help, but too much of it doesn't necessarily help. Intuitively, constraint propagation does _global_ tidying up of the search space, while refinement looks at a specific corner of the search space more carefully. A bit of tidying up helps in making sure that you are looking in corners that actually contain solutions. But, too much tydying up may be counter-productive in that for getting one solution, we don't need to completely tidy up the whole search space. Technically, in CSP community, it is found that for most problems, the break-even point occurs around 2-consistency or 3-consistency. The higher level consistencies are two costly to enforce, while the lower level consistencies are not as good at tidying up the search space enough to have any effect on the search performance. What is perhaps more important, there is in fact a bi-directional synergy between constraint propagation and refinement. In particular, suppose we are using 2-consistency as our preferred constraint propagation, and are trying to solve a queens problem. At the beginning of the problem, 2-consisistency fails to bring about any tightening of the domains of the variables, since for every positioning of every queen, there will be at least one legal positioning of any other single arbitrary queen. However, as we do some refinement, committing to a specific value for some queen in each branch, then under each branch, 2-consistency can effectively propagate constraints and tighten domains. In particular, in the branch where queen 1 is kept in position 1, we know that queen 2 cannot go into position 1 or 2, and q3 can't go to position 1 or 3 and so on. These values will thus be weeded out by 2-consistency. Let us keep these lessons in mind when we consider the various control strategies that were tried for non-backtrackable decisions in UCPOP. Back to Non-backtrackable decisions in UCPOP -------------------------------------------- In UCPOP, the role of CSP variables is played by the flaws, and the role of CSP variable values is played by the resolution possibilities for the flaws. From the discussion above, we are primed to agree with the idea that the right way to control non-backtrackable decisions in UCPOP will also involve keeping track of the resolution possibilities of the flaws in UCPOP, and do constraint propagation on them to tighten the live domains and constraints. This however was not realized at first in the literature (even now it is not clearly articulated anywhere other than these notes!). What happened instead was that researchers tried a bunch of ideas which indirectly pushed them closer and closer to the realization that they should really be doing constraint propagation on the live domains (resoution possibilities) of the flaws (variables). Given that we are interested in controlling goal selection and conflict selection decisions, the first papers, eg. those by Smith and Peot, concentrated on conflict selection decisions. Original work concentrated on "postponement" of conflicts. Given that conflict resolution is an optional step in Refine-Plan, we know that we don't have to do conflict resolution at all and still will be able to get a solution for the planning problem (Qn. What is the analogue for this in CSP?) Thus, if we postpone considering conflicts, or even ignore them all together, we can reduce the branching factor due to conflict resolution. We have two extreme solutions-- Ignore all conflicts and delay them to the end or handle each conflict as soon as it is created (as a historical side note, the SNLP algorithm given by McAllester had for some obscure reasons, the conflict resolution step before the open condition establishment step. Given the enormous impact thate clarity of that paper had on planning community, it was not surprising that for a long time it was assumed that the order -- conflict resolution first and open condition resolution next -- is somehow essential). Both extremes have their disadvantages. In particular, eager conflict resolution increases the branching factor up front, but then it can reduce the cost of consistency check, and thus help planner avoid refining inconsistent plans. Perhaps more important, it also allows the other refinements to be more informed. For example, we can avoid certain simple establishment possibilities that won't work. The only time such eager conflict resolution is going to improve efficiency is if the search reductions through its advantages outweigh the search increase through the increased branching factor. In face, in the paper on Refinement Search that I co-authored with Knoblock and Yang, we found that eager tractability refinements (of which conflict resolutions are a subclass), are usefull only when they wind up indirectly reducing the establishment branching factor. Since the overall branching factor of UCPOP is the branching factor at the establishment stage (b_e) times the branching factor at the tractability refinement stage (b_t), the product can go down if increase in b_t leads to a significant decrease in b_e. Our experiments confirm this as the main reason for utility of tractability refinements. But, of course, this is a rare event. In general, eager resolution leads to performance degradation. Similarly, the extreme lazy resolution of conflicts, while reducing b_t (and making it 1 in the extreme), will increase b_e since it (a) does not recognize inconsistent plans as easily (unless it uses an NP-hard consistency check and (b) may allow spurious establishment possibilities (see the example I gave during the CSP class). By far, (b) turns out to be a bigger disadvantage since (a) can be avoided by using NP-hard consistency check, but that will still not avoid b. Since both extremes have disadvantages (predictably so), we would like to consider other types of conflict resolution strategies. There are two broad middle grounds. One is to delay conflicts that will be provably resolvable at some later point when they are picked up. This means that the delayed conflicts will not cause implict inconsistencies in the plan (although they can still allow spurious establishment possibilities). This idea is used in D-Sep (delay separable threats), and another strategy due to Smith and Peot which analyzes operator graph to recognize conflicts that can be provably delayed indefinitely. Although the above ideas _delay_ conflicts without causing inconsistency, they still allow spurious establishments to take place. In particular, suppose there is a conflict that has only one possible resolution -- say promotion. Since all the conflicts in the plan must eventually get resolved, this means that if the plan is to be a solution, it _must_ have that promotion ordering. Noticing this, and resolving the conflict will then allow us to cut down certain spurious establishment possibilities, which will be taken if we didn;t notice it. More over, we are doing this resolution is a no-loose situation since the search space size will not increase given that the conflict only has one possible resolution. This idea -- which essentially suggests that we should not be postponing a FORCED conflict (ie a conflict with only one resolution), is used in D-UNF strategy of Peot and Smith and the Z-lifo strategy of Schubert and Gervini. Note that in checking if a conflict is forced, we should in general consider the current state of the plan. In particular, suppose a conflict c is introduced in a plan P and had two resolutions. Latter, P is refined into P' and due to the additional orderings among steps, c may now only have one resolution. Thus, to figure out if a conflict is forced, we need to dynamically compute its resolution possibilities. This is like computing live domain of a variable in CSP. Since the computation is often done considering each conflict in isolation, the more precise analogy is to ensuring 1-consistency. Specifically, seeing conflicts as varaibles, and their resolutions possibilities as their domains, D-Unf and Z-lifo etc. remove values from the domain that are not even feasible with the 1-ary constrains on that conflict (imposed by current partial plan). Until this point we have only been considering conflict flaws, and did not talk about open condition flaws. Unlike conflict flaws, open-condition flaws cannot in general be postponed indefinitely without loss of completeness. If we want to transfer our current knoweledge of conflict resolution to open-condition flaw resolution also, we might say that we should just be picking a flaw that has the smallest number of remaining live resolution possibilities, and resolve it. This is the LCFR -or least cost flaw refinement-- idea. Notice that computing the number of resolution possibilities for each flaw can be very costly, as in the worst case, we have to actually simulate the resolution of that flaw on that plan. In particular, open conditions typically have many resolution possibilities and enumerating them all can be expensive. Operator graph structures described by Smith (in the Rockwel TR) provide a way of statically computing estimates of the # of resolution possibilities for each open condition. We have remarked above that D-UNF, Z-LIFO and consequently LCFR, can be considered as enforcing 1-consistency on the domains of the conflicts. From CSP analogy, we know tha 1-consistency is typically not strong enough to tighten the problem sufficiently. In particular, even though the live domains of two conflicts after 1-constency enforcement are both 2 sized, there may actually not be a way of resolving both the conflicts at the same time. Thus, the plan may be dead and we don't realize this, and continue exploring spurious establishment possibilities. Enforcing 2-consistency will allow us to take care of this. This makes us realize, slowly, what should have been obvious from the beginning -- that the D-UNF, Z-lifo, LCFR etc are essentially steps towards doing constraint propagation along with refinement. Now that we see the utility of constraint propagation, it behooves us to put it in the foreground (rather than do constraint propagation in the background). In other words, it will be useful for us to model the flaws as variables and the resolution possibilities as the values, and be able to post arbitrary constraints on variables. If we are really only interested in conflict flaws, there is a very easy way of doing this -- which is to convert each conflict into a disjunctive ordering constraint, and do constraint propagation on the disjunctive ordering constraints. This is the stuff I talked to you during the CSP lecture. Of course, the disjunctive orderings based model is not sufficient once we consider open condition flaws, or even conflict flaws that have confrontation and separation as resolution possibilities. In this case, an explicit modeling of the conflicts, and their resolutions, as described in Joslin and Pollack' paper would be required. Dynamic Constraint Satisfaction Problems: When goal flaws are modeled as csp variables, or when conflict flaws allow confrontation, the resulting csp problem is no longer the standard csp problem. In particular, choosing a specific establishment possibility for a goal would may in turn introduce new goals (open conditions) into the plan. There is a special type of CSP problems called Dynamic CSP problems which can be used to model this to some extent. In particular, in a Dynamic CSP, we have some N variables with their domains. Each variable can have an active/inactive status. A subset of these N variables, say, n, is active in the beginning, and the rest of the variables are inactive. The CSP problem is solved when all the currently active variables have values, and the assignment does not violate any constraints among the active variables. The twist is that when you assign a specific value to a currently active variable, that might, as a side effect, make a bunch of currently inactive variables active. This feature can essentially model the goal/subgoal variables in plannings. The one difference is that strictly speaking, in planning, we cannot really put a limit on the number of total goal variables, --active and inactive combined. This is because, infinite subgoaling is possible. Anyway, it is worth noting that dynamic constraint satisfaction is no where near as well studied as ordinary (static) CSP is. The typical methods are iterative and to try and solve the static CSP problem involing the currently active variables, and whenever we find that new variables must become active to solve the problem, we introduce a minimal number of new variables, and try to solve the new version again as a static CSP. As far as I can understand (from Joslin's paper as well as his talk), he uses similar methods to solve the dynamic constraint satisfaction problem that corresponds to a planning problem. While the full dynamic constraint satisfaction problem is necessary for modeling goal flaws, it is not clear to me whether goal flaws really do provide much constraint propagation. At the very least, the results in Joslin's current paper don't convince me of this. Looking from CSP point of view, we can think of dynamic CSP as introducing subgoaling behavior into CSP. Charles Petrie has papers in AAAI 92 and AAAI 91 where he proposes a different formulation called "CDPs" or constrained decision problems, to model situations that have aspects of both CSP and Subgoaling. Worth taking a look at. Another connection is the whole area of constraint logic programming. Logic programming (e.g. prolog) essentially involves repeated subgoaling. More recently, people have been looking at variants of logic programming which allow first class status for constraints within logic programming. The progress in that area also may help us provide a formulation of planning that allows both refinement and constraint propagation. Summary: In summary, constraint propagation, which focuses search on non-backtrackable variable selection choices in CSP can also be quite useful in controlling non-backtrackable choices in planning. All the tradeoffs that hold between propagation vs. refinement in csp also hold in planning. ----------