NOTES ON IMPROVING PLANNING EFFICIENCY Prepared by: Robert H. Guttman Revised by: Rao Kambhampati [Oct 30, 1995] Note: Read Rao's notes on Controlling non-backtrackable decision before reading these. PAPERS COVERED David Joslin and Martha E. Pollack. "Passive and Active Decision Postponement in Plan Generation." In European Workshop on Planning (EWSP), Assisi, Italy, 1995. M.A. Peot and D.E. Smith. "Threat-Removal Strategies for Nonlinear Planning." In Proc. Eleventh AAAI, 1993. OVERVIEW As we learned in earlier classes, there are two critical decision points during partial order planning (POP) processing: (1) goal ordering and (2) threat resolution strategy. Unfortunately, the POP planners we have studied up to this point have been very sensitive to the choice at both of these decision points. Specifically, the choice determines how many nodes are visited during the POP process, many of which can be dead-end nodes. Peot and Smith's paper address one of the two decision points presented above, (2) threat resolution strategies (TRSs). Joslin also discusses an approach to (2) called LCFR that he initially introduced in an earlier paper. The following threat resolution strategies were discussed in my presentation: DSEP This TRS postpones conflicts that are not "real". A conflict is real if it's variable bindings indicate that there is a conflict requiring demotion, promotion, or another TRS. As you may recall from earlier classes, DSEP's lazy strategy was deemed better, in general, than a less lazy approach (i.e., separation). DUNF/ This TRS postpones conflicts that are not "forced". A ZLIFO forced conflict is a conflict that can only be resolved in one way (because all of its other TRSs have already been eliminated). In other words, it's branching factor (BF) is one. Choosing a conflict that really has no choice makes intuitive sense. If these forced conflicts existed in our plan-space and we didn't choose them immediately, then we would need to repeatedly choose them in each successive search branch - a rather inefficient strategy. LCFR Joslin's LCFR strategy is a slight generalization of the DUNF/ZLIFO strategy discussed above. Instead of postponing non-forced conflicts (this with a branching factor greater than one), LCFR postpones conflicts that don't have the least BF at that node. For example, if we had three conflicts with BFs of 2, 3, and 4, then LCFR would postpone the conflicts with BFs of 3 and 4 and choose to work on the conflict with a BF of 2. Full Eager Resolve each conflict as soon as it occurs Full Lazy postpone all conflict resolution to the end. The above threat resolution strategies (TRSs) fall somewhere in the spectrum of available TRSs in partial order planning (POP). The above TRSs can be considered as wrestling with a tension between the branching factor and the per-node cost, both of which we would generally like to reduce. However, when we reduce the branching factor, we also (ideally) want the plan represented at each node to be consistent. To *guarantee* this, we would need to commit to check its consistency, an expensive per-node cost (see the "Extreme" TRS above). Seeing that it is too costly to guarantee consistency, we often settle for an approximation by postponing splitting the plan into multiple branches and let successor nodes eventually determine if the plan is consistent. The DUNF/ZLIFO and LCFR TRSs above attempt to improve upon this by more intelligently choosing which conflicts to postpone and which to work on. This tends to push the BF lower in the search tree, which we have learned is a "good thing". Why? Because, in general, the plan is more constrained in the lower levels of the search tree and, therefore, dead-end nodes can be detected more readily. This tension between BF and per-node cost is one of the characteristics that differentiate the TRSs above. Notice that in DSEP, the per-node cost is quite low and the BF relatively high. In LCFR, we can see that the BF is reduced (by only postponing conflicts that don't have the least BF at that node), but that the per-node cost is higher than DSEP. This cost is attributed to determining the BF of each node's conflicts. (For example, it takes time to determine if a given conflict can be resolved via demotion, promotion, or separation which itself can be represented by several branches, one for each possible bindings of its variables.) Joslin's paper shows that LCFR is more efficient than standard UCPOP. However, I was told that LCFR exhibits roughly the same efficiency as DUNF/ZLIFO. Even though LCFR tends to reduce the BF more so than DUNF/ZLIFO, it is made less efficient overall due to its higher per-node cost of determining the BF of conflicts. PASSIVE POSTPONEMENT Joslin's paper focuses on one of the efficiency problems of partial-order causal link (POCL) planners, namely "passive postponement". Passive postponement is an attribute of a search strategy that chooses which conflict to work on *without* considering how other constraints may influence the consistency of the plan. Specifically, passive postponement can result in the propagation of "dead-end" nodes during threat resolution. Joslin has shown in his paper that an exorbitant amount of time can be wasted by continually traversing such dead-end branches. This wasted time can be so large that a solution to the problem can not be found in a given amount of time, thus resulting in failure. Figure 2 of Joslin's paper shows that up to 90% of the nodes examined in a given problem were dead-end nodes on successful problems and up to 98% of the nodes examined were dead-end nodes on unsuccessful problems. ACTIVE POSTPONEMENT Active postponement is an attribute of a search strategy that chooses which conflict to work on *while* considering how other constraints may influence the consistency of the plan. In POP, we know that as we traverse down each branch in the plan-space, we have a very localized view of the problem at each node. We also know from our recent constraint satisfaction problem (CSP) studies that techniques exist that provide a more global view of the problem. Joslin's paper presents an approach to planning that allows for this more global view by incorporating CSP techniques into the planning process. It is this global view of the problem that that allows for active postponement. Before we explore Joslin's approach, we can also present active postponement using the POP model (as was discussed by Rao during the constraint propagation class). For example, instead of branching upon a conflict (such as creating one branch for demotion and another for promotion), we could traverse down only one branch with a disjunctive ordering on the conflict resolution possibilities (such as demote *or* promote). Then, we can use these disjunctive orderings to propagate constraints at each node much like in the CSP problems we have studied. This approach would allow for each node of the branch to have a more global view of the constraints on the problem and can therefore make much more intelligent decisions on how to split the plan-space (such as being able to prune dead-end nodes much more readily). Joslin takes a different tact to provide active postponement. Instead, Joslin provides a framework where the planning can be viewed as either a planning problem (using traditional POP techniques) or as a CSP (using known CSP techniques). These views are related by the Temporal World Model (TWM)'s formalization being mapped to CSP variables, domains, and constraints and then sharing these variables with the planning component. Joslin's system, called Descartes, allows for the use of only planning techniques, only CSP techniques, or any strategy in between. Whenever the CSP problem is solved, its variable bindings result in a plan that is a solution and whenever plan is solved, its variable bindings result in a CSP that is a solution. As illustrated in his paper, the benefit of Joslin's approach is the pruning of dead-end nodes via active postponement. However, Joslin's approach also benefits the planning process in other ways. In addition to pruning dead-end nodes, the general purpose CSP engine also tightens the plan's current constraints (such as further limiting the domains of variables). This has the effect of reducing future branching by variables being allowed to take on fewer bindings. In addition, Joslin's approach allows the CSP engine to work on "flaws" where a flaw is defined as either a goal achievement or a threat resolution. This allows Descartes to consider goal orderings within the CSP engine besides just threat resolution conflicts alone. _It was not discussed_ in his paper how much of Descartes' efficiency improvements resulted from the managing of goal achievement (as opposed to threat resolution conflicts). It is worth discussing Joslin's implementation further. For example, we know from our POP techniques that goals can be achieved by either simple establishment of operators that already exist somewhere in the plan or by step addition where we introduce a new operator into the plan. We also know from the discussion above, that Descartes maps planning elements (Temporal Qualified Assertions (TQAs) in this case) onto CSP variables, domains, and constraints. However, in the CSP techniques we have learned up until this point, we always deal with a static number of variables. A question then is, how does Descartes deal with new operators introduced by step addition? The answer is that Descartes CSP component works as a dynamic CSP engine. This allows for the introduction of new variables into the problem which in turn will change the constraints of the problem. Dynamic CSP problems can be thought of as having a huge number of variables, one for each possible planning element as needed. Initially, only a small subset of the variables will be considered "active" by the CSP engine and only active variables will be considered. As CSP processing continues, some inactive variables will become active and need to be considered by the CSP engine. However, as we know from our POP studies, the number of steps introduced into a plan, and thus the number of subgoals (preconditions of the steps) can grow without bound. Seeing that we would need to represent these somehow as active and inactive variables in our CSP problem, we find ourselves in trouble. Obviously, this view is inadequate for representing plan elements as CSP variables. In terms of how Descartes' CSP engine works, I will provide a brief explanation. As stated earlier, the CSP engine treats threat resolution and goal achievement as "flaws" that get worked on. However, the CSP engine deals with these flaws slightly differently. For threat resolution flaws, a variable Dt has the domain {a,b} for example as follows: (Dt 3D a) -> after( goto(tile1), fill(hole1) ) (Dt 3D b) -> before( goto(tile1), goto(hole1) ) If it is learned that after( goto(tile1), fill(hole1) ) cannot be satisfied , then the CSP engine deduces Dt !3D a and Dt 3D b. If the threat cannot b e satisfied, then the CSP engine deduces that Dt can not be satisfied given its domain and that the current node is a dead-end. For goal achievement flaws, the same approach applies, but new boolean variables G and J for example are introduced representing *tentative* step instantiations (since it is not yet known if either step will actually be in the final plan). If it is learned that a constraint that has J as a conjunct cannot be satisfied, then the CSP engine may deduce J 3D FALSE an d G 3D TRUE. As discussed during my presentation, preliminary and experimental results show that Joslin's approach to improving planning efficiency is superior to most of the other approaches we have looked at. One example illustrates UCPOP not being able to solve a problem after 24 hours of processing whereas Descartes was able to solve it in under three seconds. DUAL-REPRESNTATION PLANNING By representing a planning problem from both planning and CSP points of view, techniques from both fields can be applied. This can only be beneficial as is evidenced by Descartes' preliminary results. I explained earlier that Descartes achieves this dual-representation by mapping planning elements onto CSP variables, domains, and constraints. I will briefly explain this further now. Descartes uses Allen and Koomen's Temporal World Model (TWM) approach to formalize planning problems where each proposition has an associated temporal interval where it is considered to be TRUE. These temporal intervals are known as Temporal Qualified Assertions (TQAs). For example, let's say we have two operators P(x)[ t1, t2 ] and ACP(y)[ t3, t4 ] in our planning problem meaning that P(x) is TRUE between temporal interval t1 and t2 and ACP(y) is TRUE between temporal interval t3 and t4. In mapping the se to CSP variables C1 and C2, we have: C1 -> P(x)[ t1, t2 ] C2 -> ACP(y)[ t3, t4 ] To resolve the threat introduced when x and y are bound to the same value, we need to add the constraint that C1 cannot unify with C2. This is saying that P(x) cannot be TRUE over the same temporal interval when ACP(y) is TRUE. This is similar to the demotion, promotion, and separation that we've seen in POP. By representing plans and CSPs in this way, Descartes is able to share variables between the two views. As stated earlier, this means that whenever the CSP problem is solved, its variable bindings result in a plan that is a solution and whenever plan is solved, its variable bindings result in a CSP that is a solution. Some points: -- Does one need TWM model to introduce constraint propagation into planning (clearly not-- see disjunctive ordering constrains discussion) -- Has there been any empirical evidence that modeling open conditions as CSP variables helps in search? -- The details of the control strategy used by Descartes are very hard to glean from this short paper. In particular, does Joslin really solve a dynamic csp, or does he solve static csp's entailed by the dynamic csp? -- It would be interesting to understand the relation between Petrie's Constrained decision problems, which start from constraint satisfaction and add subgoaling behavior, and Descartes which starts from planning and converts it into a CSP. ---