Notes on Explanation based and Inductive learning of search control knowledge (Billal Muzaffar) Revised and extended by Rao [Nov 16, 1995] We know that learning in general improves performance of the search engine. How can learning be interfaced with planning to improve performance? Control the search: The means controlling which branches to take and when. New Knowledge: Improve the domain theory. Explanation Based Learning (EBL): The main idea behind EBL is to let the planner dynamically learn from its own failures. During search whenever a failure is encountered the learner tries to explain the cause of that failure and constructs a rule that the planner can use in the future to avoid similar failing paths. The learning tries to bias the planner toward the successful paths. There are two main kind of failure encountered by the planner during search. 1) Analytical Failures: Whenever analytical inconsistencies are detected, failure is ineluctable. The inconsistencies can be : i ) Binding inconsistencies i.e. ?x is codesignated with A and ?x is not codesignated with A. ii) Ordering inconsistencies i.e. if we have a partial plan with steps s1 and s2 and ordering constraint such that s1 < s2 and s2 < s1 iii) Establishment i.e. the plan has an open condition which has no simple establishment or step-addition possibilities. 2) Depth Limit failures: The partial plan is assumed to be failing whenever the search crosses a predetermined depth limit. The idea is to prevent the planner from searching in the fruitless paths. In general it is hard to explain depth limited failures and to come up with sound rules that explain these failures. Sometimes sound rules can be learned by analyzing the partial plan with respect to strong domain axiom based consistency checks. Failure explanations: A failure is explained by identifying a minimal subset of the constrains in the partial plan that is mutually exclusive or inconsistent. This explanation is regressed over the decision (that the planner made to generate this failing partial plan) and propagated up in order to construct an explanation for the ancestors of this failing plan that is predictive of this failure. In other words it tries to find the 'root of the problem'. This explanation is then generalized and converted into a rule that would influence the future search to avoid similar failures. Regression: The process of back propagating an explanation over a decision is called regression. P <=\ E' | \ decision=d | | | E = explanation \|/ / Pfail Pfail = partial plan that is flagged as failing E = explanation of this failure P is the plan that was refined by taking the decision d to produce Pfail. Given that a plan (P) would be failing (Pfail), the idea is to avoid generating it in the first place. In order to do that it is necessary to identify the constraints in P that are predictive of this failure, i.e. identify the minimal set of constraints (E') that are present in P such that after taking the decision d, a failing plan is generated. "Formally, regression of a constraint c over a decision d is the set of constraints that must be present in the partial plan before the decision d, such that c is present after taking the decision." Dependency directed backtracking: If an explanation E does not change after it has been regressed over a decision d to node 'n' then it means that E is already part of n. Since by definition E is supposed to be a set of inconsistent constraint, n must be an inconsistent plan. Thus, there is no point considering any other unexplored children of n (siblings of the node from which E was propagated). Generalization: There are two ways the EBL analysis provides generalization. 1. It tells us what subset of the plan constraints are really responsible for the failure. This way, we can predict and avoid the same failure in a new situation, even though the plan in the new situation is not completely the same as the previous plan. It is enough if they agree on the details that are deemed important by the regression/propagation based analysis. 2. A secondary source of generalization is that sometimes we can generalize over the steps and objects occuring in the regressed failure explanation. This is especially true in name-insensive domains where the operators don't refer to specific object by their name but qualify them thorough their properties. Issues IN EBL ------------------------------------------------SLIDE--------------------------- Handling depth limit failures -Look for implicit failures -Ignore depth limit branches and continue learning -Provide incomplete/incorrect explanations of failure -Explain why this is not a goal state and use it as the explanation of failure (Bhatnagar) -------------------------------------------------------------------------------- As we have seen ,once the search crosses a depth limit it is hard to come up with a sound explanation for these failures. One way is to look for implicit failures. The depends on how powerful the failure checks are. There are several ways of explaining the failure of P. 1. <

> is nil (so it is inconsistent and thus doesn't lead to solution). The cost of this check depends on how easy it is to detect inconsistecies in the constraints of the plan. Some inconsistecies such as ordering inconsistencies are easier to detect. Others, such as the np-condition based inconsistency, discussed in the paper are costlier. 2. <

> may not be nil, but there is no solution in it either. Ie. supposed L(G) is the set of all ground operator sequences (minimal and non-minimal) that take us from init state to goal state. Then, this check ensures that <

> intersection L(G) = nil In worst case this check can be as costly as the planning itself. 3. Another class of check says that no _desirable_ solution belongs to <

>. Suppose our definition of desirability is that the solution be minimal. Suppose L'(G) is the set of minimal solutions for the problem. Then, this check ensures that <

> intersection L'(G) is nil This can be done by showing that the plan is looping (e.g. see Kambhampati's loop pruning technqiues in IJCAI-95) or by using some domain specific theory of solution desirability. (E.g. if our domain is such that we don't quite like solutions that involve two plane changes, then any partial plan that contains two plane change operations can be pruned, with an explanation) 4. Finally, we may also say that a plan P is failing if it does contain solutions, but so do its siblings (the idea being that this way completeness is not lost) <

> intersection L(G) is not empty, but there is also a sibling P' of P such that < intersection L(G) is also not nil. ------------- Given that finding implicit failures in the depth-limit plans is a time-consuming activity, another way to handle depth limit is to ignore those planss and continue learning. In otherwords, we act as if the branch crossing the depth limit doesn't exist at all. The idea is to attach a 'degree of soundness' with an explanation of failure. With analytical failure we know that a rule will be sound. So the degree is 1. In such a domain we can just ignore the depth limit failure (or in essence give "True" as its failure explanation, with soundness 0). For example a rule learned at a node n, which has 10 children, with 9 analytical failures and 1 depth limit fails would be 1 - 1/9 % (88%)sound (or 11% unsound). Faiure explanations that are not 100% sound can be understood as partial explanations. Ie. Suppose the real explanation of failure is C1 & C2 & .. Cn (where Cn are plan constraints). A partial explanation is a set of constraint that has some overlap with the real failure explanation (It may miss some of the constraints in the real failure explanation -- thus making the explanation over general, or add constraints that are not present in the real explanation -- thus making the explanation over general). Since they are not sound, they cannot be used as pruning rules without losing completeness. However, they can be used as "black-listing" (or "don't prefer") rules -- ie., if a rejection rule is learned through a failure explanationthat is not completely sound, then we will put any partial plan matching with the LHS of that rule at the end of the search queue. --- Relaxing overgeneral rules --------------------------- Once we allow learning by ignoring depth limit plans, we will no longer have the "too few rules learned" problem, but rather have the "too many rules learned" problem. In particular, if the depth limit is small and there are too few analytical failures, we may be learning rules that are very low on soundness (ie. 20% sound etc.). In such cases, it may happen that all the plans in the search queue are black listed by some rule or another. At this point, we are forced to essentially override the advice of some rule. A reasonable way of doing this would be to pick the plan that was blacklisted by the rule with the lowest soundness . factor and override that rule, and refine that plan. In such a case, suppose we wind up finding that there is a solution below that node afterall, we are also allowed an opportunity to relax the rejection rule. Even if we find that there is no solution, we will still have an opportunity to update the soundness factor of this rule (since we are now going further down below the plan and we may find better explanations of its failure -- fewer of which are depth limit failures. Avoiding overgenral rule learning by using incomplete failure explanations at the depth limit failures: ----------------------------------------------------------- Rather than gving "True" as the explanation of failure for the depth-limit plans (with soundness 0), we may want to look for incomplete explanations of failure for leaf nodes. This will help in improving the soundness of the learned rules since any reasonable partial explanation is better than ignoring the depth limit failure all together. One way of finding incomplete explanations of failure of a depth-limit plan are to explain WHY THE PLAN IS NOT A GOAL PLAN (rather than explain why it is inconsistent). FOr example, suppose we are using a LIFO strategy and are working on top level goals one by one, exhausting their subgoals, before going to the next top level goal. Suppose in trying to work on top level goal g2, we cross the depthlimit. We can then look at the plan and say why the plan is not achieving g2. THe reasons could be any open conditions and unresolved causal links. A subset of these can be seen as an incomplete failure explanation. ---digression--- This is the kind of analysis Bhatnagar does in his FAILSAFE system (Bhatnagar also converts the "lack of features in the plan" explanation into "THe features present which are responsible for the absent features" explanation -- For example to say that On(A,B) is not present in the current state, he may say that Clear(B) is present, which implies that On(A,B) is not present. It is not clear how important this transofrmation is. In particular, it is likely that if we have expressive action representations, such translation is not requred, since actions that make Clear(B) true will also say Forallx ~On(x,B). in the plan ------ ------------------------------------------------SLIDE--------------------------- Multiple failure explanations: -If there are multiple failure explanations, pick the one that allows for deepest backtrack Utility of learned rules -Best way is to keep usage stat on rules (as in Composer project) -Sometimes you can have syntactic theories of what sort of rules will be utilized -Rules learned from non recursive explanations are typically more useful -------------------------------------------------------------------------------- Multiple failure explanations. When the planner encounters multiple failure explanation a single node (for example ordering and binding inconsistencies together) EBL just takes one. It possible to OR all the explanations and do regression of them together up the tree. This will increase the cost of EBL (e.g. if a node n has two children whose explanations of failure are F1 or F2 ; F'1 or F'2 respectiely, then the explanation of failure of n is a four way disjunction [F1 & F'1] or [F1 & F'2] or [F2 & F'1] or [F2 & F'2] THe disjunctions increase combinatorially as we go up, and thus increase the cost of regression and propagation. On the flip side when we learn a rule with disjuction on LHS, that is equialent to multiple search control rules at one go! THe approach of using a single explanation of failure at a time essentially cuts down ebl cost, but will take more examples before it learns the rules learned by the disjunctive approach. -- BY the way, it is not the case that disjunctive rules take any longer to match-- youcan always split disjunctive rules into many non-disjunctive rules! THus, the matching problem that I mentioned in the class is a non-issue. Rao ---- So EBL tries to follow the general rule "Don't be eager". Utility of learned rules: In talking about the goodness of the learning method, we want to ask (a) how much improvement did it bring about in performance and (b) how long did it take to bring about that improvement The latter is called the convergence rate. The former depends on how costly it is to match the learned rules, and how much search do they cut down when they happen to match. IF the match cost exceeds search improvement, we don't benefit from the rules. THis is called the utility problem. The utlity of a rule depends not just onthe form of the rule, but also on the problem distribution encountered -- i.e., how likely is the failure predicted by this rule in the problem population you are encountering. THe general solution for the utility problem is to carefully monitor the match cost vs. search improvement of each of the rules, and throw away rules that are not having a positive cost-benefit ratio. Sometimes however, certain syntactic criteria are used to eliminate rules whose LHS are going to so long that they will take too long to match, making it unlikely that they will have a positive cost-benefit ratio. Explanations of non-recursive concepts are typically more compact than the explanations of recursive concepts. THus, typically, EBL sytems avoid learning from recursive concepts. In many cases, explanations of success are recursive while explanations of failure are non-recursive. This indicates that learning from failure may lead to more utile explanations. See Etzioni's STATIC approach. Inductive learning: ------------------------------------------SLIDE--------------------------------- Inductive learning of search control knowledge Superivised classification learning task is a task where you are given +ve and -ve exmaples of a concept and you attempt to infer the concept description by generalizing over these examples. We will discuss the details of this task, and consider how learning in planning can be posed as a classifcation learning task. Classical Inductive learning problem: -given a space of hypotheses 'H' -a set of examples of the concept labeled +ve and -ve Find the hypothesis h that correctly classifies the examples. h will be considered the description of the concept. _________________________________ -------------------------------------------------------------------------------- (from cse598 class notes) +------------+ Induction +--------------+ |Premise |---------> |Hypotheses | |Statements | |rules | |facts |<----------| +------------+ Deduction +--------------+ | | | | +---------------------------------------+ | Background Knowledge | +---------------------------------------+ Inductive inference generate hypotheses or rules from facts and /or other hypotheses. While deductive inference generate facts from hypotheses. In this respect, deductive learning is logically valid learning. But, inductive learning is not always logically valid. in other words, the results of deductive learning are always true in our domain, but the results of inductive learning can be false We say that a learning algorithm is PAC (Probably Approximately Correct) with parameters e and d (epsilon and delta) if the probability that the hypothesis that is generated by the learning algorithm will make more than epsilon fraction of errors is less than delta . I.e., Pr(error(h) > e) < d. PAC learning theory says that to probably approximately learn concept with error rate E, and probability 1-S we need 1/E(ln (1/S) + ln |H| ) the larger the hypotheses space the harder the learning ---------------------------------SLIDE-------------------------------- A variety of concept learning /classification algorithms exists -Hill Climbing -Least commitment -Decision Tree -NSC algo for learning disjunctive decision regions -Neural network learning -Bayesian network learning * if we can somehow convert search control rule learning into a classification problem, we can use these learning algorithms We can do this essentially by first solving the planning problem. Next we take the solution path which contains search states, and the decision taken to go to the next search state. s0 - d1 - s1 -d2 - s2 ................dn -sn(solution) Now, suppose we want to learn the conditions under which it is right to use the decision d1. We can say that s1 is a +ve exmaple for this concept. Similarly, any other branch in the search space which has been explored and where d1 has been used in a state s' will provide a negative example of the usage of d1 (since the other branches didn't lead to solution). We can then generalize over these exmaples to learn a search control rule for using decision d1. If we want to learn conditions under which d1 should be rejected (rather than selected), we can do this by reversing the polarity of the examples -- consider +ve examples -ve and vice versa. Any of the inductive learning algorithms -- decision trees, neural networks and what have you -- can be used to generalize over these examples. The rate of convergence of an inductive learning depends on how good the examples are. In particular, examples which are represented with fewer number of features/attributes are easier to generalize over than examples which are represented with too mny features. Thus, to increase the rate of convergence of search control rule learning, we can pre-process our examples such that any irrelevant attributes (features) are removed. For example, if we happen to knwo that the reason decision d1 taken from a plan P1 lead to failure is really related to only a subset of the constraints of P1, then we can use that subset of constraints in decribing the example. Notice that EBL analysis can be used to find out the part of the plan that may be responsble for failure. Thus, EBL and Induction can help each other!! Specifically, we can related EBL and Induction in terms of background knowledge. If the background domain theory is so strong that EBL is able to compeltely explain why the failure occured, then we can learn useful search control rules from a single example and its explanation. On the other hand if the background knowledge is not strong enough, we will only get "likelY" explanations of failure,and we can use induction over those explanations to improve the quality of the search control rule. Estlin's work on Inductive logic programming based search control rule learning for planners uses exactly this kind of approach. ----------------------------------------------------------------------