Notes from 8-28-95

		 Dynamical Systems as the Abstraction
		 for Understanding Issues in Planning
						     
			     Notes by Rao
						     
Although the definition of planning as the "problem of coming up with
a sequence of actions that take the agent from a given initial state
to the desired goal state" is quite popular, the planning problem in
AI typically is more general than this. To situate the planning
problem properly, and to understand the various issues that arise in
generating plans of action, we will use the abstraction of
"CONTROLLING DYNAMICAL SYSTEMS".  This abstraction is chiefly due to
the work of Tom Dean (see the CRC chapter and his previous monograph
"Planning and Control"). 

Informally, a dynamical system is any environment that includes the
agent, whose "state" (configuration) evolves over time (see block
diagram below). The evolution can be modeled by a transition function
that takes the history of previous states of the system and decides
what the next state will be.


The state of the dynamical system includes the information about the
environment as well as that of the agent (INCLUDING the internal
computational state of the agent. This is needed since when the agent
does sensing actions, the external world remains the same, but the
agent's computational state changes -- it "knows" more information
about the environment). 

Typically the transition function will depend only on the immediately
preceding state (rather than the history of all states). Such
dynamical systems are called "MARKOVIAN". (Similarly, if the
transition function does not depend on the previous states at all,
then the system is called "state-less"). 

A dynamical system is "controllable" if its transitions can be
influenced by the inputs from the agent. The "control inputs" of the
agent are termed "ACTIONS", and the transition fuctions of
controllable dynamical systems typically depend on the actions and the
previous state of the system -- specifically, the state at t will
depend on the state at t-1, and the action input at t-1.

The actions may be affecting either the external environment or,
the internal state of the agent, as is the case for sensing and
information gathering actions. 


						      -------------- |
						     |         	     |
					       ----->| Evaluator     |
					      |	     | of State Seq  |
					      |	 --->|		     |
					      |	|    ---------------- 
					      |	|                     
					      |	 Agent's Objectives
					      |			   
     -----------------------------------------|--------------------------------
    |					      | 		              |
    |					      | 		              |
    |                 ---------------------   | 	 --------------	      |
    |      ---------->| Environment       |   |State     |Output       |      |
    |     |           | f(s{t-1},a{t-1})  |------------->|Transformer  |----  |
    |     |         ->| 		  |   | s{t}     | h(s{t})     |    | |
    |     |        |  ---------------------   | 	 |_____________|    | |
    |     |        |    		      | 	               	    | |
    |     |        |__________________________|	               	            | |
    |     |             		        	                    | |
    |     |             	 ------------------------------- |    Observed|
    |     |                      |                   Agent  with |    State   |
    |  Action/Control input      |    |---------------- objective|       o{t} |
    |     a{t}                   |    | Plan Generator |   Obj   |          | |
    |     |                      |    |                |	 |	    | |
    |     |                      |    |  T(o{t},Obj)   |   	 |	    | |
    |     |                      |    |----------------	   	 |<-------- | |
    |     |----------------------|            |        	   	 |	      |
    |           	         |            |        	   	 |	      |
    |           	         |    --------V---------- |	 |	      |
    |           	         |    | Executor          |	 |	      |
    |                            |    | E(P{t},O{t})      |	 |	      |
    |                            |    |___________________|	 |	      |
    |           	         |_______________________________	      |
    |         		       			      			      |
    | Dynamical System (Includes the agent & the Environment)   	      |
    |_________________________________________________________________________|
	      		       
	      		       
The behavior of a dynamical system is the sequence of states it
emits.

Having defined the DS (dynamical system), we now want to talk about
the role for plans.

The general idea is that the agent wants to "control" the behaviors
exhibited by the DS. This can be done by changing any component of the
DS that affects the transitions.

 1. We can change the transition function. This is like "changing" the
    environment. Although this is often hard to do, intelligent agents
    do sometimes change their environment! For example, if you want to
    get a good grade, and don't want to increase the amount of time
    you spend, you could change the course you are taking, or change
    the school at which you are taking it! Within AI planning, very
    little work has been done on changing the environment -- One recent
    exception is the work on Stabilizing environments, done by Hammond
    et. al. 

 2. We can change the actions that are input to the transition
    function. This is usually the role of planning, and we will talk
    more about it.


Before we talk about how plans should be generated, or even what plans
_are_ semantically, let us talk about how one would go about deciding
whether a plan is good or bad. Since plans are there to help the agent
control the behavior of the DS, the only reasonable way to evaluate
them is by looking at the behaviors of the DS, when those plans are
being used by the agent to decide actions. 

We will start by assuming that the reason the agent wants to control
the behavior of the DS is that it wants the exhibited behaviors to
satisify some properties given by its objective function. Suppose
there is a neutral evaluator that is outside of the D.S. that has
access to both the behaviors (state-sequences) exhibited by the DS and
the objectives that the agent is attempting to achieve.  The evaluator
can thus look at the actual behaviors produced by the agent, and see
how they satisfy the objectives.

To take care of uncertainity in the environment, we assume that the
evaluation is done by looking at the "averaging" the quality of
behaviors exhibited by the D.S. in multiple trials. 

The evaluation of a behavior -- state-sequence -- will most generally
depend on the sequence of states themselves. Some possibilities are:

   -- goals of attainment which only care about the final state of the
      state sequence

       -- goals of attainment, with time restrictions which also care about
           the length of the sequence

   -- goals of maintenance (e.g., ensure that the gas in the plane doesn't
       go below certain level, ensure that you are alive et.c)

   
All the above can be seen as defining the value of the state sequence
in terms of the values of the states comprising the sequence. 
This model may pose some problems in the case of dynamical systems
that may evolve for indefinite periods of time -- since the value of
infinite state sequences may become infinite too. One way of making
this problem go away is to "discount" the vlaues of the future states
in the sequence by some exponentially reducing term

  V(seq) =   Sum [    Value(s_n) * r^n ]     0<= r < 1

This will make the series sum converge, taking care of the infinities
problem.
 
It can also be argued that this is not just a mathematical strategem
but rather the only way of rationally measuring the peformance of
dynamical systems that evolve over indefinite periods of time
(e.g. how do you rank a student A, who has already done well in all
her courses, against another whose performance till now has been
dismal, but who may, in the future some time, get the nobel prize? The
only rational way is to discount the future promises relative to past
accomplishments). 

-----
Now that we know how to measure the pefromance of the dynamical
system, we want to talk about plans and how to make them. 

In general, a plan can be thought of as a mapping from the current
time, and current observed state to an action. (Note that this covers
both the sequential plans, and policies discussed below).


Notice that we say "observed state" rather than "state" in the
above. This is because typically the planning agent doesn't have
access to the complete state of the dynamical system. This is modeled
in our block diagram by the output tranformer which transforms the
real state into the observed state.

   The difference between real and observed states may involve:

      - loss of information about certain aspects of the system state
        (e.g., if you model the class room as a dynamical system, the
        student agent typically doesn't have access to the information
        about the questions on the exam a priori. Similarly, if we
        model driving on a multi-lane road as the dynamical system,
        certain critical pieces of information such as "is the guy
        in the next lane a suicidal maniac with sociopathic tendencies
        or not?" are not available to the agent!)

     - Corruption of the state information due to noise (either noise
       in the environment or noise in the sensors of the agent)
        (e.g. if we model driving on a multi-lane road as a dynamical
        system, the positions of 

   Sometimes the lost information can be recovered. For example, noise
   can be eliminated through successive measurements, and information
   lost can sometimes by gathered through alternate means (e.g. if the
   position of a certain block is not visible because of occlusion,
   the planner can take actions such as "move left" to get to a
   position where the occlusion is avoided, and the position can be
   sensed). 

   At other times, the lost information just cannot be recovered, and
   the agent has to control the dynamical system without complete
   information (e.g., you have to pass exams without knowing what is
   on the exam paper a priori, you have to play poker without knowing
   what is on your opponents hand a priori). 

It is easy to see that in general, perfect planning is not possible
unless the agent has access to the complete state. More realistically,
sometimes, the agent has to do "planning" to _recover_ part of the
lost state that may be relevant for generating the plan capable of
achieving its objectives. In such cases, the agent may have to
generate a plan of action whose purpose is to _gather_ information or
"RECOVER STATE".

   The state recovery problem has received a significant attention in
   control theory, and notions such as "observability" of the system
   state are made precise for certain classes of dynamical systems. 


Plan Generation
---------------

Assuming that the adequate amount of state information has been
observed, and the precise objective of the agent is available, the
next question is how is the planner going to convert this information
into a plan of action?

The general idea is for the planner to "simulate the evolution of the
dynamical system" through various courses of action that it can
take. This simulation process is called "PROJECTION". The planner can
use the results of the projection to decide which course of actions to
commit to. It can then give this information to the executor, which
starts executing the plan. [Note that this is the general idea --  in
specific cases we may be able to generate plans without doing any
explicit projections.]

NOTE that projection is different from actual execution -- you can
find out what an action does by actually doing it  and observing the
DS -- but this is of little use in planning, where we are interested
in deciding what we will do _before_ doing them. For the later, the
planner needs to have its own model of the world. 
  
  Models of the Domain (Dynamical system)

To be able to simulate the DS's evolution, the planner needs to have a
_model_ of the environment (i.e., the transition function, f). In most
realistic scenarios, a complete and correct model of the dynamical
system hard to get and the planner has to settle for an incorrect and
incomplete model. The planner generates and evaluates the correctness
of its plan with respect to this model before committing to it. 

Once the model of the DS is available the projection process can be
used to generate plans. 


Many distinctions
------------------


Within the model above, we can make many distinctions:


CLASSES OF ENVIRONMENTS

Deterministic/Non-deterministic/Stochastic

 A dynamical system is considered deterministic if its transition
function is a deterministic function of the input action and the
previous state (i.e, given the same action and same previous state,
the system goes to the same next state). 

 It is considered non-deterministic if it is not deterministic (i.e.,
given an action and a previous state, the next state is not uniquely
determined). 

 The system is called Stochastic if its is non-deterministic, and the
probability of transisition to the various next states is known.

 Note that non-deterministic dynamical systems are hard to
control. Stochastic and deterministic ones can be controlled.


Static vs. Dynamic:

 A dynamical system is said to be static if its transition fuction
becomes "identity function" when the control input is said to "No op"
(or more informally, the world doesn't change when the agent is not
doing anything).

 A DS is said to be dynamic if it is not static.

 One can of course talk about a spectrum between static and dynamic --
in particular, the rate at which the system evolves. 


Observable vs. Non-observable:

 A dynamical system is said to be "completely observable" if its internal state
can be recovered through the output transformation
function. If only part of its state can be observed, then it is called
"partially observable". (The control theoretic definition of
observability actually considers a system to be observable if the
state of the system at time t can be completely recovered through
observations on the state of the system at a finite number of future
time instances). 

CLASSES OF PLANNERS
-------------------

On-line vs. Off-line Plan generation:

 The planning system is considered on line if the plan-generation
component continuously uses the current observed state information in
coming up with the plan. This may involve interleaving plan generation
and plan execution.  It is considered off-line if the plan generator
takes a snapshot of the current state and obejctives, and generates a
complete plan using just that information. All plan generation is thus
done prior to any execution.


Open-loop vs. Closed-Loop Plans:

  The planning system is considered Open-loop, if  
the executor does not use the observed state of the environment during
execution. Otherwise, it is considered closed-loop.


Plans vs. Policies:

In deterministic and static environments, where the world evolves
predictably, the agent can achieve an objective by just synthesizing a
sequence of actions, which will make the environment transition
through the desired sequence of states, culminating in the desired
goal state. 

In the non-deterministic and dynamic environments, the world my
transition into a state that is not expected by a single plan. Thus,
ensuring that the agents objectives are satisfied my involve having
"plans" that will take the agent from any given state towards the
desired states. In otherwords, the agent needs to know what
action/control input to give for any given state of the environment it
finds itself in. Such a "all-state plan" is typically called a
"policy". A policy is universal if it covers _all_ states of the
environment (A universal policy is also called a universal plan). It
is sometimes more feasible computationally to compute partial policies
-- those that tell the agent what to do for only a subset of the
states of the environment.

If you are into computing partial policies, a natural question is
"what is the subset of states" that you should cover. The natural
answer is to cover those states that have high prior
probabilities. For example, it is possible that even a dynamic
stochastic environment is more likely to get into some wierd states,
and less likely to get into other wierd states. So, the policy
construction should progress by covering the more likely wierd states
first. One way of doing this could be to construct a linear plan that
goes through most likely states, and then consider stray transitions
possible from those states.


Planning vs. Control theory

The discussion on dynamical systems would show that the work in
planning and control theory should have a lot in
common. Interestingly, the fields have hither-to been largely
separate. Here is a simplistic picture.

Work in planning typically concentrates on dynamical systems where the
"next action" is not obvious givne the current state and the
objectives.  Much of the research is on "generating" plans of action,
given elaborate models of the environment, and complex and changing
objectives of an agent. Example: What should I do now so that a
particular set of goals will be accomplished at some point in the
future. Much of the early work disregarded the issues of noise in
sensing, loss of state due to observations etc.

In contrast, control theory concentrated on D.S.s where the next
action is more or less obvious given the complete state of the
environment. Example: How do I control the steering wheel of a car,
given information about the state of the surrounding environemnt. The
action selection is typically easier in that if youare straying to
left you steer the car right and vice versa. Through the feedback from
enviroment, the planning/control system decides on the magnintude of
steering etc. There is typically no long-winded reasoning involved in
action selection, and all the work is on finding the right
state-action mapping through feedback. Since the fed-back state is
critical in deciding the course of action, a lot of work has been done
on controlling partially observable systems. Typically this is done by
RECOVERING the state lost due to noise. The celebrated Kalman Filter
provides a way of recovering state in a certain class of dynamical
systems where the observed state is noisy. 

As our previous discussion shows, the distinction between planning and
control are really artifacts of the specific problems that the
researchers in the two areas chose to work on. Real planning problems
do require handling of noise, state-recovery, sensor management etc.,
just as complex control problems will require sophisticated reasoning
to select actions. 


----Where are we in the big world?

We will start by considering classical planning problem, where we
consider dynamical sytems that have DETERMINISTIC, STATIC and
COMPLETELY OBSERVABLE environments. Our planning systems will be
OFF-LINE and OPEN-LOOP.

As the semester progresses, we will have the opportunity to consider
plan generation in other types of dynamical systems.

				-----