RECONSTRUCTING GRAPHPLAN ALGORITHM FROM FORWARD PROJECTION Subbarao Kambhampati rao@asu.edu (Draft working notes ASU CSE xxx) [Oct 15, 1995] Graphplan is the first realistic contender that has emerged for the refinement planning paradigm. An implementation of graphplan in C seems to be the most efficient classical planner that is around now -- it solves in seconds problems that are almost impossible for the traditional refinement planners without search control. The graphplan algorithm looks very different from the planners we studied until now. Part of this is because the authors, Blum and Furst are either not very conversant with the planning literature (which turned out to be good since they were not biased by it), or didn't have sufficient space in the ijcai paper to elaborate on the connections with previous work. However, since we did spend a fair bit of time understanding normal refinemnt planners, and are probably already biased by them, it does make sense for us to try to derive the graphplan ideas from the existing ones. In these notes, I (Rao) will show that Graphplan can actually be best understood in terms of doing a forward projection, where the search space size is reduced by keeping the projected states in a compact "unioned" format. The ensuing computational tradeoff seems to be a good one as is evidenced by the results of Graphplan. The realization of the connections between forward projection and graphplan also helps us get a better perspective on the types of extensions that can be done to graphplan. Motivating Graphplan from Forward Projection: At the heart of graph plan is a forward projection stage where actions are applied from a proposition list to generate new proposition lists. These proposition lists are not world states (in that not every proposition in the prop list may hold in the world at any time), but are best seen as the union of propositions from all possible states that come up after a fixed number of projections from the initial state. Since the prop list doesn't correspond to real states, even if the goals of the problem are present in the prop list at some stage i, it doesn't necessarily mean that there is actually a plan of i steps -- we need to search backwards to see if one does exist. This _backward_ projection step can be made cheaper by keeping information about which subsets of the proposition list can actually co-exist. This is done by keeping "mutual exclusion" relations and propagating them. Let us now elaborate this idea by re-visiting forward projection. Consider the following example where the initial state contains the propositions R and W. There are five actions, O1, O2, .. O6 with preconditions and effects as follows: (R) O1 (P, !R) [prec on right and effects on left] (W) O2 (Q, !W) (W) O3 (M, !R,!W) (P,Q) O4 (Z) (Q,W) O5 (J) (Z,W,R)O6 (K) now consider the process of forward projecting these actions from the initial state. Recall that projection involves applying every applicable action to the initial state to compute new states, and applying operators to the resultant states iteratively. Planning ends when one of these states contains all the goals. (This is the usual FSS). We can make this projection process more efficient by considering the sets of actions that are independent of each other (i.e., they don't delete each others preconditions are added effects) together. In particular, although both O1 and O2 are applicable in the initial state (R,W), rather than consider them in separate projection branches we can consider them _together_. In particular, we can apply the action set (O1,O2) together to the state (R,W) to get the state (P,Q). THe semantics here is that irrespective of the actual order in which O1 and O2 are applied to the state (R,W), the final state is guaranteed to be (P,Q). Considering independent actions together rather than separately has some computational advantages from the forward projection point of view. First off, it reduces the branching factor of the projection space -- instead of two branches one for O1 and one for O2, we have one branch. Another advantage, that may not be obvious at first glance, is that by applying sets of independent actions together we could effectively reduce the _depth_ of the projection tree. To see the last point, suppose our goal was Z. Only O3 is capable of giving Z and it requires both P and Q to be true. If we apply O1 and O2 together to the state (R,W), then we can directly apply O3 in the resultant state, thus effectively stopping projection in two levels. If we applied O1 and O2 separately to (R, W), then we would have had to apply O2 and O1 repsectively to the resultant states, before we would get to a state where O3 is applicable (thus increasing the depth of projection). To summarize, projecting independent actions together is just a darned good idea. This idea has actually been around in planning literature for a while. Drummond in his KR-89 paper on learning situated control rules describes his projection process in these terms. Now, consider the process of doing projection as described above, but with one additional augmentation -- apart from the five actions described previously, we will add a buch of no-ops, one for each state, which take the contents of the state as their preconditions, and give the content of the state back as their effects. Thus, there is one no-op applicable in any state. [[The addition of this no-op, while contributing to the search efficiency of forward projection, is useful in understanding the derivation of Graphplan algorithm later.]] Here is a projection tree resulting from this kind of projection: (R,W) -------level 0 state(init) ------------------------- | | | o1,o2 | |o3 | no-op --actions applied in lev1 | | | (P,Q) (M ) (R,W)---------\ -- level 1 states. / | | \ \ \ o4/ nop| |nop \o1,o2 \ \nop -- lev 2 actions / | | \ \o3 \ (Z,P,Q) (P,Q) (M) (P,Q) (M) (R,W) ---- lev 2 state ... continues ... This projection tree continues until we encounter a state which contains all our goals. As soon as such a state is found, the plan is directly output as the path from the initial state to this state. If our goal is Z, we can terminate with the left most state in the third level, and the path will be [o1,o2] -- o4. The semantics of this being that in the first step, we can execute o1 and o2 in any arbitrary order, and in the second step we execute o4. Note that the plans that are output have a special form of parallelism-- at any given time step many independent actions may be executed in parallel. This is however very different from the kind of partial order found in _plan space_ plans. In particular, within a given branch of the projection tree, the positioning of the operators with respect to time steps is strictly fixed (recall from UCP that state-space refinements have to do this to derive the state). Another important aspect of searching for plans using this forward projection tree is that any search process whcih does a _level-by-level_ enumeration of the projection tree will be guaranteed to find the SHORTEST plan in terms of the number of steps (this is quite straightforward to see). Of course, finding the shortest plan in terms of the number of time steps is not the same as finding an optimal plan with respect to some random objective function (which may, for example, assign higher cost to o1, and lower cost to o2, etc. ). -- Now, the BIGGEST problem with forward projection as the basis for planning is that the projection tree can grow exponentially quite easily. (From our discussion of forward state space search and MEA, note that some of this growth can be cut down by using the compute-relevant function to isolate only those actions that are recursively relevant to the toplevel goals. However, even this will not be enough. There are way too many states at each level of the projection tree, and each level is exponentially larger than the previous level -- the combinatorics can easily overwhelm any search strategy. One way of getting a handle on the forward branching would be to some how consider all the states in a given level _TOGETHER_, rather than as separate nodes in the search tree. In particular, seeing the states as sets of propositions, we can try to UNION the sets together to get the collection of propositions from which all the states in that level are made up. Such a union corresponds exactly to the _proposition list_ maintained by graph plan's plan graph structure at that level. Going back to our example, the prop list at 0th level is the union of just the init state, and is thus (R,W). At the second level, it is (P,Q,M,W,R), and at the third level, it is (Z, P, Q, M, W , R). If you run graphplan and construct the plan graph for this example, this is exactly the proposition lists you will find. (note: we will see below that in general, however, the plangraph proposition lists may contain more propositions than the union of the sets at i-th level however, because plan-graph may do some illegal projections). Given this observation, it is easy to see why the plan graph of i-levels contains only i proposition lists, while a projection graph of i levels contains b^i states. The b^i states of projection graph are unioned together to give rise to the i-th level proposition list. Once we make this correspondence between the forward projection graph and graph-plan's plan-graph, all other details of graph-plan algorithm can more are less be reconstructed as good algorithm design (but recall the twin platitudes -- god is in the details, and hind-sight is 20-20). Let us now proceed to unravel these details. Loosely speaking, the unioned proposition list at level i can be seen as a compact "disjunctive" representation of the b^i states at the i th level of the corresponding projection graph. Doing this unioning is clearly a good idea in terms of curtailing the growth of the projection tree. However, proposition list, being a union of the individual state propositions, doesn't have the concrete physical semantics of state. In particular, an arbitrary subset of the proposition list at i-th level may not be part of any realizable state at i-th level. For example, (Q,W) is a subset of the prop list at first level (P,Q,M,R,W). However, there is no state at the first level in the projection graph which contains Q and W together!. The fact that every subset of the proposition list may not belong to any valid state implies of course that we need to be careful both in growing the plan graph using proposition lists, as well as terminating using the proposition lists. In particular, if (Q,W) is our goal for the problem, just because it is part of the prop list at level 1 didn't mean that we could exhibit a plan of one step length. Similarly, since (Q,W) is not part of any legal state, we should not really be projecting the action O5 (which has preconditions Q,W) from this proposition list. Of course, one dumb way of keeping the proposition list, while still realizing that (Q,W) cannot belong to any legal state at the corresponding level in the projection graph would be to keep information about all legal states. This is dumb because we will be keeping exponential amount of additional information on top of the proposition list, and essentially will be doing forward projection anyway. Another idea is to remember the opposite-- i.e., remember all the subsets of a proposition list that DON'T belong to a legal state in the proposition graph. This seems better since we actually need keep only all the minimal illegal subsets. In particular, if we know that (Q,W) is a illegal subset, then we will know that any superset of (Q,W) is also an illegal subset--thus we can reduce redundant storage. However, computing all minimal subsets turns out to be just as hard as keeping all the legal states. In particular, if we have m propositions in the prop list, we will have to consider all the 1 subsets, 2-subsets,... m-subsets that are illegal. We thus have to examine and store (m choose 1) + (m choose 2) + ... + (m choose m) = 2^m illegal subsets. One way out of this dilemma is to not demand all illegal subsets, but just get by with a subset of the illegal subsets. In particular, it is useful to consider all 2-sized illegal subsets alone. This is essentially a pair-wise incompatibility or mutex relation between propositions. There are several advantages in computing these. 1. THere are only atmost (m)(m-1)/2 pairwise mutex relations in a m-sized proposition list (as against 2^m total) 2. The pairwise mutex relation can be computed in an incremental fashion -- starting with the actions in the initial level, we can consider any pair of action that are not independent to be mutex. The mutex relation can then be propagated to any effects of the actions in the next proposition list. In particular, in the example we have been following, O2 and O3 are non-independent. Thus, M and Q are considered mutually exclusive at the first level prop. list. This makes sense since O2 and O3 are in different branches, and they lead separately to M and Q, and there is no single state that contains M and Q in the forward projection graph. Similarly, (Q W) are mutex since the no-op at the first level is not independent of o1 or o2, which delete R and W, and their mutex propagates to their effects Q and W. The pairwise mutex relations are able to answer the plan-existence question as well as curtail useless projections in plan-graph to some extent. In our example, since (M,Q) are known to be mutex, any problem with goals that include (M,Q) cannot have a solution at the first level of the plan graph (despite the fact that M and Q are part of the first level prop list). Similarly, since (Q,W) are found to be mutex, there is no point in projecting O5 from first level prop list, and graph-plan algorithm realizes this. The tradeoffs in using pair-wise mutex. -------------------------------------- Although pair-wise mutex relations (or 2-sized illegal subsets) are easy to compute incrementally, as well as take up less space, assuming conservation of complexity, we must be trading off something by using them, instead of the full set of illegal subsets. The price we have to pay for this compactification is two fold: 1. We may still do some illegal projections. Even though we recognize all pairwise mutex propositions, we may not recognize arbitrary subsets of propositions (of size greater than 2) that may be illegal. In particular, it is possible that P and Q are not mutex and Q and W are not mutex (in that there are legal states that contain them), but P,Q,W are together illegal (in that none of the states that contain P and Q or Q,W contain all three together). Since graph-plan won't realize this from mutex, it will account for these illegal projections in the plan graph. In our example, this happens when graph-plan algorithm includes O6 in level 3 actions, and the proposition K in level 3 props, although it is easy to see that the forward projection graph will not be able to apply O6 anywhere from level 2 states. 2. For roughly the same reasons as above, we may not be able to answer plan-existence question when we have more than two subgoals. From the point of view of planning, 2 is the central problem. Although 1 clutters the plan graph with illegal projections, as far as planning is concerned, it just makes the problem of 2 worse. We can handle 2 by doing a backward chaining on the proposition lists at different levels, through the actions that lead to those propositions. In particular, given a set of goals (g1...gn) that are present in the i-th level prop list, and none of the goals are pairwise mutex in the prop list, the only time we are sure that there is actually a plan of i-steps that can give us g1..gn is when we can exhibit a path through the corresponding projection graph that terminates in a state that contains g1...gn. (Note that such backward chaining phase is not necessary either in forward projection graph, or in a plan-graph that keeps _all_ illegal subsets of each prop list. The backward chaining is thus a fair price we should be willing to pay for curtailing the space blow-up of forward projection and the time-blowup of the full illegal subset enumeration). We can do this by keeping track of the connections between the actions in stage i, and their effects in the prop list at stage i, and their preconditions in the prop list at stage i-1. For this purpose, graphplan keeps actions and propositions at each level as individual vertices of a graph, and puts edges between an action and its effects and its preconditions. This graph-structure can be used to effectively traverse the plan-graph backwards to answer the plan-existence question. We can do this efficiently by using a recursive dynamic programming algorithm, which exploits the mutex relations that are stored at each level. In particular, given the goals g1...gn to be satisfied at level i, we consider one action each for g1 to gn from the actions at level i, such that none of the actions are pairwise mutex. Once this is done, the preconditions of these actions become recursive subgoals for the previous proposition level. If everything goes well, this recursion petersout at the initial prop list (corresponding to the initial state). By the time the recursion ends, we would have identified a subgraph of the plan-graph that corresponds to a valid path in the corresponding forward projection graph, and is thus a valid plan for our problem. At any time we are unable to find actions to cover all the subgoals of the current level, without violating mutex relations, we know that the subgoals at the current level cannot be made true. We then backtrack to the previous level, and check to see if the subgoals at that level can be achieved through a different set of actions. If we can't the subgoals at that level fail too and the backtracking continues. The failures of recursion can be cached to improve performance further. In particular, when we find that recursion fails on some set of subgoals g1...gm at level i, and we are about to backtrack to level i+1, we know that g1...gm is an illegal subset of the prop list at level i. Normally, we are only storing the 2-sized illegal subsets to reduce computation. But, since this particular subset thurst itself upon us as a side-effect of back-chaining, we may as well store it (just on the off chance that some set of goals g1....gn that contains g1...gm needs to be satisfied at this level in future recursions, and we can shortcut the computation by recognizing that the set is unsatisfiable). Graphplan thus memoizes these failing goal subsets. Note of course that (a) the memoized illegal subsets won't help in cutting down illegal projections of the plan-graph (since the projection has already been done by the time the backward chaining started). and that (b) in the worst case, we may still be storing O(2^m) lists-- the saving grace being that we are only storing them since they became available and not because we were eagerly enumerating all illegal subsets. The reconstruction of graph-plan in terms of forward projection has, I hope, reduced some of the mystery as to the antecedents of the algorithm. At the very least, it should clarify to you that the fact that graph-plan plans can contain multiple actions within each step, and the fact that graph-plan terminates with smallest plan (in terms of time-steps), are both a consequence of the properties of the corresponding forward projection graph. However, by intelligently distributing complexity between the search space size, the mutex relation computation, and a non-trivial backward chaining phase, graph-plan seems to be very efficient empirically. Of course, this realization of the antecedents of the graphplan in terms of complexity distribution makes us expect that there should be "killer examples" for graph-plan too. More interestingly, while doing pair-wise mutex relation turns out to be a pretty good choice, there may be other tradeoffs between pair-wise mutex and the full set of all illegal subsets, that may do better than pairwise mutex relations based graph-plan implementation in some cases. Another thing we note from the connection between graph-plan and forward projection is that graphplan should, at some level, inherit some of the limitations of planning based on forward projection. In particular, I conjecture that it may be harder for graph plan to handle augmented planning problems where the planning starts not just with init state and goals, but also sets of IPCs to be maintained and sets of actions that must be part of the final plan. Similarly, I conjecture that graphplan will have some difficulty in being able to reuse/modify specific plans. Of course, none of this is meant to be a black-mark on graph-plan. Given that it out does many existing algorithms, developed over a 20-year time period, it deserves a lot of critical examination and attempts at extension. Finally, the termination results of graphplan can also be seen as a consequence of termination for forward projection. But, I will leave this as a non-trivial exercise for the curious. Rao Kambhampati [Oct 15, 1995]