Notes on Explanation based and Inductive learning of search control
knowledge 
(Billal Muzaffar)
Revised and extended by Rao [Nov 16, 1995]
	
We know that learning in general improves performance of the search
engine.  How can learning be interfaced with planning to improve
performance?  Control the search: The means controlling which branches
to take and when.  New Knowledge: Improve the domain theory.

Explanation Based Learning (EBL): The main idea behind EBL is to let
the planner dynamically learn from its own failures.  During search
whenever a failure is encountered the learner tries to explain the
cause of that failure and constructs a rule that the planner can use
in the future to avoid similar failing paths.  The learning tries to
bias the planner toward the successful paths.

There are two main kind of failure encountered by the planner during
search.  1) Analytical Failures: Whenever analytical inconsistencies
are detected, failure is ineluctable.  The inconsistencies can be : i
) Binding inconsistencies i.e. ?x is codesignated with A and ?x is not
codesignated with A.  ii) Ordering inconsistencies i.e. if we have a
partial plan with steps s1 and s2 and ordering constraint such that s1
< s2 and s2 < s1 iii) Establishment i.e. the plan has an open
condition which has no simple establishment or step-addition
possibilities.

2) Depth Limit failures: The partial plan is assumed to be failing
whenever the search crosses a predetermined depth limit.  The idea is
to prevent the planner from searching in the fruitless paths.  In
general it is hard to explain depth limited failures and to come up
with sound rules that explain these failures.  Sometimes sound rules
can be learned by analyzing the partial plan with respect to strong
domain axiom based consistency checks.

Failure explanations: A failure is explained by identifying a minimal
subset of the constrains in the partial plan that is mutually
exclusive or inconsistent.  This explanation is regressed over the
decision (that the planner made to generate this failing partial plan)
and propagated up in order to construct an explanation for the
ancestors of this failing plan that is predictive of this failure.  In
other words it tries to find the 'root of the problem'.  This
explanation is then generalized and converted into a rule that would
influence the future search to avoid similar failures.

Regression: The process of back propagating an explanation over a
decision is called regression.

                          P  <=\ E'
                           |    \
               decision=d  |    |
                           |    E = explanation
                          \|/  /
                         Pfail       
                  
      Pfail = partial plan that is flagged as failing
      E = explanation of this failure
	
      P is the plan that was refined by taking the decision d to
produce Pfail.  Given that a plan (P) would be failing (Pfail), the
idea is to avoid generating it in the first place.  In order to do
that it is necessary to identify the constraints in P that are
predictive of this failure, i.e. identify the minimal set of
constraints (E') that are present in P such that after taking the
decision d, a failing plan is generated.  

"Formally, regression of a constraint c over a decision d is the set
of constraints that must be present in the partial plan before the
decision d, such that c is present after taking the decision."

Dependency directed backtracking: If an explanation E does not change
after it has been regressed over a decision d to node 'n' then it
means that E is already part of n. Since by definition E is supposed
to be a set of inconsistent constraint, n must be an inconsistent
plan. Thus, there is no point considering any other unexplored
children of n (siblings of the node from which E was propagated).

Generalization:

There are two ways the EBL analysis provides generalization. 

1. It tells us what subset of the plan constraints are really
responsible for the failure. This way, we can predict and avoid the same
failure in a new situation, even though the plan in the new situation
is not completely the same as the previous plan. It is enough if they
agree on the details that are deemed important by the
regression/propagation based analysis. 

2. A secondary source of generalization is that sometimes we can
generalize over the steps and objects occuring in the regressed
failure explanation. This is especially true in name-insensive domains
where the operators don't refer to specific object by their name but
qualify them thorough their properties. 


Issues IN EBL
------------------------------------------------SLIDE---------------------------
Handling depth limit failures
      -Look for implicit failures
      -Ignore depth limit branches and continue learning
      -Provide incomplete/incorrect explanations of failure
      -Explain why this is not a goal state and use it as the explanation of 
failure (Bhatnagar)
--------------------------------------------------------------------------------

As we have seen ,once the search crosses a depth limit it is hard to
come up with a sound explanation for these failures.  One way is to
look for implicit failures.  The depends on how powerful the failure
checks are.  

There are several ways of explaining the failure of P. 

1. <<P>> is nil (so it is inconsistent and thus doesn't lead to
solution). The cost of this check depends on how easy it is to detect
inconsistecies in the constraints of the plan. Some inconsistecies
such as ordering inconsistencies are easier to detect. Others, such as
the np-condition based inconsistency, discussed in the paper  are costlier. 

2. <<P>> may not be nil, but there is no solution in it either. Ie.
supposed L(G) is the set of all ground operator sequences (minimal and
non-minimal) that take us from init state to goal state. Then, this
check ensures that <<P>> intersection L(G) = nil

In worst case this check can be as costly as the planning itself.

3. Another class of check says that no _desirable_ solution belongs to
<<P>>. Suppose our definition of desirability is that the solution be
minimal. Suppose L'(G) is the set of minimal solutions for the
problem.  Then, this check ensures that 
  
 <<P>> intersection L'(G) is nil 

This can be done by showing that the plan is looping (e.g. see
Kambhampati's loop pruning technqiues in IJCAI-95) or by using some
domain specific theory of solution desirability. (E.g. if our domain
is such that we don't quite like solutions that involve two plane
changes, then any partial plan that contains two plane change
operations can be pruned, with an explanation) 

4. Finally, we may also say that a plan P is failing if it does
contain solutions, but so do its siblings (the idea being that this
way completeness is not lost)

 <<P>> intersection L(G) is not empty, but there is also a sibling
 P' of P such that <<P'> intersection L(G) is also not nil. 

-------------

Given that finding implicit failures in the depth-limit plans is a
time-consuming activity, another way to handle depth limit is to
ignore those planss and continue learning. In otherwords, we act as if
the branch crossing the depth limit doesn't exist at all. 

The idea is to attach a 'degree of soundness' with an explanation of
failure.  With analytical failure we know that a rule will be sound.
So the degree is 1. In such a domain we can just ignore the depth limit
failure (or in essence give "True" as its failure explanation, with
soundness 0). 

For example a rule learned at a node n, which has 10 children, with 9
analytical failures and 1 depth limit fails would be 1 - 1/9 %
(88%)sound (or 11% unsound).  

Faiure explanations that are not 100% sound can be understood as
partial explanations. Ie. Suppose the real explanation of failure is 
C1 & C2 & .. Cn (where Cn are plan constraints). A partial explanation
is a set of constraint that has some overlap with the real failure
explanation (It may miss some of the constraints in the real failure
explanation -- thus making the explanation over general, or add
constraints that are not present in the real explanation -- thus
making the explanation over general). 


Since they are not sound, they cannot be used as pruning rules without
losing completeness. However, they can be used as "black-listing" (or
"don't prefer")
rules -- ie., if a rejection rule is learned through a failure
explanationthat is not completely sound, then we will put any partial
plan matching with the LHS of that rule at the end of the search
queue. 

---
Relaxing overgeneral rules
---------------------------

Once we allow learning by ignoring depth limit plans, we will no
longer have the "too few rules learned" problem, but rather have the
"too many rules learned" problem. In particular, if the depth limit is
small and there are too few analytical failures, we may be learning
rules that are very low on soundness (ie. 20% sound etc.). In such
cases, it may happen that all the plans in the search queue are black
listed by some rule or another. At this point, we are forced to
essentially override the advice of some rule.

A reasonable way of doing this would be to pick the plan that was
blacklisted by the rule with the lowest soundness . factor and 
override that rule, and refine that plan. In such a case, suppose we
wind up finding that there is a solution below that node afterall, 
 we are also allowed an opportunity to relax the rejection rule. 

Even if we find that there is no solution, we will still have an
opportunity to update the soundness factor of this rule (since we are
now going further down below the plan and we may find better
explanations of its failure -- fewer of which are depth limit
failures.

Avoiding overgenral rule learning by using incomplete failure
explanations at the depth limit failures:
-----------------------------------------------------------

Rather than gving "True" as the explanation of failure for the
depth-limit plans (with soundness 0), we may want to look for
incomplete explanations of failure for leaf nodes. This will help in
improving the soundness of the learned rules since any reasonable partial
explanation is better than ignoring the depth limit failure all
together. 

One way of finding incomplete explanations of failure of a depth-limit
plan are to explain WHY THE PLAN IS NOT A GOAL PLAN (rather than
explain why it is inconsistent). FOr example, suppose we are using a
LIFO strategy and are working on top level goals one by one,
exhausting their subgoals, before going to the next top level goal.
Suppose in trying to work on top level goal g2, we cross the
depthlimit. We can then look at the plan and say why the plan is not
achieving g2. THe reasons could be any open conditions and unresolved
causal links. A subset of these can be seen as an incomplete failure
explanation.

---digression---
This is the kind of analysis Bhatnagar does in his FAILSAFE system
(Bhatnagar also converts the "lack of features in the plan"
explanation into "THe features present which are responsible for the
absent features" explanation -- For example to say that On(A,B) is not
present in the current state, he may say that Clear(B) is present,
which implies that On(A,B) is not present. It is not clear how
important this transofrmation is. In particular, it is likely that if
we have expressive action representations, such translation is not
requred, since actions that make Clear(B) true will also say Forallx
~On(x,B).
in the plan 

------

------------------------------------------------SLIDE---------------------------
Multiple failure explanations:
   -If there are multiple failure explanations, pick the one that allows for    deepest backtrack

Utility of learned rules
     -Best way is to keep usage stat on rules (as in Composer project)
     -Sometimes you can have syntactic theories of what sort of rules will be 
utilized
     -Rules learned from non recursive explanations are typically more useful
--------------------------------------------------------------------------------

Multiple failure explanations.  When the planner encounters multiple
failure explanation a single node (for example ordering and binding
inconsistencies together) EBL just takes one.  It possible to OR all the
explanations and do regression of them together up the tree. This will
increase the cost of EBL (e.g. if a node n has two children whose
explanations of failure are F1 or F2 ;  F'1 or F'2 respectiely, then
the explanation of failure of n is a four way disjunction

 [F1 & F'1] or [F1 & F'2] or [F2 & F'1] or [F2 & F'2] 

THe disjunctions increase combinatorially as we go up, and thus
increase the cost of regression and propagation. On the flip side when
we learn a rule with disjuction on LHS, that is equialent to multiple
search control rules at one go!

THe  approach of using a single explanation of failure at a time
essentially cuts down ebl cost, but will take more examples before it
learns the rules learned by the disjunctive approach. 


--
BY the way, it is not the case that disjunctive rules take any longer
to match-- youcan always split disjunctive rules into many
non-disjunctive rules! THus, the matching problem that I mentioned in
the class is a non-issue. Rao
----

So EBL tries to follow the general rule "Don't be eager".
 
Utility of learned rules:

In talking about the goodness of the learning method, we want to ask
(a) how much improvement did it bring about in performance and (b) how
long did it take to bring about that improvement The latter is called
the convergence rate.

The former depends on how costly it is to match the learned rules, and
how much search do they cut down when they happen to match. IF the
match cost exceeds search improvement, we don't benefit from the
rules. THis is called the utility problem. The utlity of a rule
depends not just onthe form of the rule, but also on the problem
distribution encountered -- i.e., how likely is the failure predicted
by this rule in the problem population you are encountering. 

THe general solution for the utility problem is to carefully monitor
the match cost vs. search improvement of each of the rules, and throw
away rules that are not having a positive cost-benefit ratio. 

Sometimes however, certain syntactic criteria are used to eliminate
rules whose LHS are going to so long that they will take too long to
match, making it unlikely that they will have a positive cost-benefit
ratio. 

Explanations of non-recursive concepts are typically more compact than
the explanations of recursive concepts. THus, typically, EBL sytems
avoid learning from recursive concepts. In many cases, explanations of
success are recursive while explanations of failure are non-recursive.
This indicates that learning from failure may lead to more utile
explanations. See Etzioni's STATIC approach. 

Inductive learning:

------------------------------------------SLIDE---------------------------------
Inductive learning of search control knowledge

Superivised classification learning task is a task where you are given
+ve and -ve exmaples of a concept and you attempt to infer the concept
description by generalizing over these examples. We will discuss the
details of this task, and consider how learning in planning can be
posed as a classifcation learning task. 

Classical Inductive learning problem:
-given a space of hypotheses 'H'
-a set of examples of the concept labeled +ve and -ve
Find the hypothesis h that correctly classifies the examples.

h will be considered the description of the concept.
_________________________________
--------------------------------------------------------------------------------
   (from cse598 class notes)
     +------------+ Induction +--------------+
     |Premise     |---------> |Hypotheses    |
     |Statements  |           |rules         |
     |facts       |<----------|               
     +------------+ Deduction +--------------+
        |                                 |
        |                                 |
     +---------------------------------------+
     |              Background Knowledge     |
     +---------------------------------------+
     
     Inductive inference generate hypotheses or rules from facts and /or 
     other hypotheses. While deductive inference generate facts from 
     hypotheses.
     In this respect, deductive learning is logically valid learning. But, 
     inductive learning is not always logically valid. in other words, 
     the results of deductive learning are always true in our domain, 
     but the results of inductive learning can be false
     
     We say that a learning algorithm is PAC (Probably Approximately
     Correct) with parameters e and d (epsilon and delta) if the
     probability that the hypothesis that is generated by the learning
     algorithm will make more than epsilon fraction of errors is less than
     delta . I.e.,
     
                        Pr(error(h) > e) < d. 
     
      
PAC learning theory says that to probably approximately learn concept with error
rate E, and probability 1-S we need

1/E(ln (1/S) + ln |H| )

the larger the hypotheses space the harder the learning

     ---------------------------------SLIDE--------------------------------
     A variety of concept learning /classification algorithms exists
     
     -Hill Climbing
     -Least commitment
     -Decision Tree
     -NSC algo for learning disjunctive decision regions
     -Neural network learning
     -Bayesian network learning
     
     
     * if we can somehow convert search control rule learning into a 
     classification problem, we can use these learning algorithms

     We can do this essentially by first solving the planning problem.
Next we take the solution path which contains search states, and the
decision taken to go to the next search state.

 s0 - d1 - s1 -d2 - s2 ................dn -sn(solution)

Now, suppose we want to learn the conditions under which it is right
to use the decision d1. We can say that s1 is a +ve exmaple for this
concept. Similarly, any other branch in the search space which has
been explored and where d1 has been used in a state s' will provide a
negative example of the usage of d1 (since the other branches didn't
lead to solution).

We can then generalize over these exmaples to learn a search control
rule for using decision d1.

If we want to learn conditions under which d1 should be rejected
(rather than selected), we can do this by reversing the polarity of
the examples -- consider +ve examples -ve and vice versa.

Any of the inductive learning algorithms -- decision trees, neural
networks and what have you -- can be used to generalize over these
examples. 

The rate of convergence of an inductive learning depends on how good
the examples are. In particular, examples which are represented with
fewer number of features/attributes are easier to generalize over than
examples which are represented with too mny features. 

Thus, to increase the rate of convergence of search control rule
learning, we can pre-process our examples such that any irrelevant
attributes (features) are removed. For example, if we happen to knwo
that the reason decision d1 taken from a plan P1 lead to failure is
really related to only a subset of the constraints of P1, then we can
use that subset of constraints in decribing the example. Notice that
EBL analysis can be used to find out  the part of the plan that may be
responsble for failure. Thus, EBL and Induction can help each other!!

Specifically, we can related EBL and Induction in terms of background
knowledge. If the background domain theory is so strong that EBL is
able to compeltely explain why the failure occured, then we can learn
useful search control rules from a single example and its explanation.

On the other hand if the background knowledge is not strong enough,
we will only get "likelY" explanations of failure,and we can use
induction over those explanations to improve the quality of the search
control rule.  Estlin's work on Inductive logic programming based
search control rule learning for planners uses exactly this kind of
approach. 

     ----------------------------------------------------------------------