Notes from 8-28-95 Dynamical Systems as the Abstraction for Understanding Issues in Planning Notes by Rao Although the definition of planning as the "problem of coming up with a sequence of actions that take the agent from a given initial state to the desired goal state" is quite popular, the planning problem in AI typically is more general than this. To situate the planning problem properly, and to understand the various issues that arise in generating plans of action, we will use the abstraction of "CONTROLLING DYNAMICAL SYSTEMS". This abstraction is chiefly due to the work of Tom Dean (see the CRC chapter and his previous monograph "Planning and Control"). Informally, a dynamical system is any environment that includes the agent, whose "state" (configuration) evolves over time (see block diagram below). The evolution can be modeled by a transition function that takes the history of previous states of the system and decides what the next state will be. The state of the dynamical system includes the information about the environment as well as that of the agent (INCLUDING the internal computational state of the agent. This is needed since when the agent does sensing actions, the external world remains the same, but the agent's computational state changes -- it "knows" more information about the environment). Typically the transition function will depend only on the immediately preceding state (rather than the history of all states). Such dynamical systems are called "MARKOVIAN". (Similarly, if the transition function does not depend on the previous states at all, then the system is called "state-less"). A dynamical system is "controllable" if its transitions can be influenced by the inputs from the agent. The "control inputs" of the agent are termed "ACTIONS", and the transition fuctions of controllable dynamical systems typically depend on the actions and the previous state of the system -- specifically, the state at t will depend on the state at t-1, and the action input at t-1. The actions may be affecting either the external environment or, the internal state of the agent, as is the case for sensing and information gathering actions. -------------- | | | ----->| Evaluator | | | of State Seq | | --->| | | | ---------------- | | | Agent's Objectives | -----------------------------------------|-------------------------------- | | | | | | | --------------------- | -------------- | | ---------->| Environment | |State |Output | | | | | f(s{t-1},a{t-1}) |------------->|Transformer |---- | | | ->| | | s{t} | h(s{t}) | | | | | | --------------------- | |_____________| | | | | | | | | | | |__________________________| | | | | | | | | ------------------------------- | Observed| | | | Agent with | State | | Action/Control input | |---------------- objective| o{t} | | a{t} | | Plan Generator | Obj | | | | | | | | | | | | | | | T(o{t},Obj) | | | | | | | |---------------- |<-------- | | | |----------------------| | | | | | | | | | | --------V---------- | | | | | | Executor | | | | | | E(P{t},O{t}) | | | | | |___________________| | | | |_______________________________ | | | | Dynamical System (Includes the agent & the Environment) | |_________________________________________________________________________| The behavior of a dynamical system is the sequence of states it emits. Having defined the DS (dynamical system), we now want to talk about the role for plans. The general idea is that the agent wants to "control" the behaviors exhibited by the DS. This can be done by changing any component of the DS that affects the transitions. 1. We can change the transition function. This is like "changing" the environment. Although this is often hard to do, intelligent agents do sometimes change their environment! For example, if you want to get a good grade, and don't want to increase the amount of time you spend, you could change the course you are taking, or change the school at which you are taking it! Within AI planning, very little work has been done on changing the environment -- One recent exception is the work on Stabilizing environments, done by Hammond et. al. 2. We can change the actions that are input to the transition function. This is usually the role of planning, and we will talk more about it. Before we talk about how plans should be generated, or even what plans _are_ semantically, let us talk about how one would go about deciding whether a plan is good or bad. Since plans are there to help the agent control the behavior of the DS, the only reasonable way to evaluate them is by looking at the behaviors of the DS, when those plans are being used by the agent to decide actions. We will start by assuming that the reason the agent wants to control the behavior of the DS is that it wants the exhibited behaviors to satisify some properties given by its objective function. Suppose there is a neutral evaluator that is outside of the D.S. that has access to both the behaviors (state-sequences) exhibited by the DS and the objectives that the agent is attempting to achieve. The evaluator can thus look at the actual behaviors produced by the agent, and see how they satisfy the objectives. To take care of uncertainity in the environment, we assume that the evaluation is done by looking at the "averaging" the quality of behaviors exhibited by the D.S. in multiple trials. The evaluation of a behavior -- state-sequence -- will most generally depend on the sequence of states themselves. Some possibilities are: -- goals of attainment which only care about the final state of the state sequence -- goals of attainment, with time restrictions which also care about the length of the sequence -- goals of maintenance (e.g., ensure that the gas in the plane doesn't go below certain level, ensure that you are alive et.c) All the above can be seen as defining the value of the state sequence in terms of the values of the states comprising the sequence. This model may pose some problems in the case of dynamical systems that may evolve for indefinite periods of time -- since the value of infinite state sequences may become infinite too. One way of making this problem go away is to "discount" the vlaues of the future states in the sequence by some exponentially reducing term V(seq) = Sum [ Value(s_n) * r^n ] 0<= r < 1 This will make the series sum converge, taking care of the infinities problem. It can also be argued that this is not just a mathematical strategem but rather the only way of rationally measuring the peformance of dynamical systems that evolve over indefinite periods of time (e.g. how do you rank a student A, who has already done well in all her courses, against another whose performance till now has been dismal, but who may, in the future some time, get the nobel prize? The only rational way is to discount the future promises relative to past accomplishments). ----- Now that we know how to measure the pefromance of the dynamical system, we want to talk about plans and how to make them. In general, a plan can be thought of as a mapping from the current time, and current observed state to an action. (Note that this covers both the sequential plans, and policies discussed below). Notice that we say "observed state" rather than "state" in the above. This is because typically the planning agent doesn't have access to the complete state of the dynamical system. This is modeled in our block diagram by the output tranformer which transforms the real state into the observed state. The difference between real and observed states may involve: - loss of information about certain aspects of the system state (e.g., if you model the class room as a dynamical system, the student agent typically doesn't have access to the information about the questions on the exam a priori. Similarly, if we model driving on a multi-lane road as the dynamical system, certain critical pieces of information such as "is the guy in the next lane a suicidal maniac with sociopathic tendencies or not?" are not available to the agent!) - Corruption of the state information due to noise (either noise in the environment or noise in the sensors of the agent) (e.g. if we model driving on a multi-lane road as a dynamical system, the positions of Sometimes the lost information can be recovered. For example, noise can be eliminated through successive measurements, and information lost can sometimes by gathered through alternate means (e.g. if the position of a certain block is not visible because of occlusion, the planner can take actions such as "move left" to get to a position where the occlusion is avoided, and the position can be sensed). At other times, the lost information just cannot be recovered, and the agent has to control the dynamical system without complete information (e.g., you have to pass exams without knowing what is on the exam paper a priori, you have to play poker without knowing what is on your opponents hand a priori). It is easy to see that in general, perfect planning is not possible unless the agent has access to the complete state. More realistically, sometimes, the agent has to do "planning" to _recover_ part of the lost state that may be relevant for generating the plan capable of achieving its objectives. In such cases, the agent may have to generate a plan of action whose purpose is to _gather_ information or "RECOVER STATE". The state recovery problem has received a significant attention in control theory, and notions such as "observability" of the system state are made precise for certain classes of dynamical systems. Plan Generation --------------- Assuming that the adequate amount of state information has been observed, and the precise objective of the agent is available, the next question is how is the planner going to convert this information into a plan of action? The general idea is for the planner to "simulate the evolution of the dynamical system" through various courses of action that it can take. This simulation process is called "PROJECTION". The planner can use the results of the projection to decide which course of actions to commit to. It can then give this information to the executor, which starts executing the plan. [Note that this is the general idea -- in specific cases we may be able to generate plans without doing any explicit projections.] NOTE that projection is different from actual execution -- you can find out what an action does by actually doing it and observing the DS -- but this is of little use in planning, where we are interested in deciding what we will do _before_ doing them. For the later, the planner needs to have its own model of the world. Models of the Domain (Dynamical system) To be able to simulate the DS's evolution, the planner needs to have a _model_ of the environment (i.e., the transition function, f). In most realistic scenarios, a complete and correct model of the dynamical system hard to get and the planner has to settle for an incorrect and incomplete model. The planner generates and evaluates the correctness of its plan with respect to this model before committing to it. Once the model of the DS is available the projection process can be used to generate plans. Many distinctions ------------------ Within the model above, we can make many distinctions: CLASSES OF ENVIRONMENTS Deterministic/Non-deterministic/Stochastic A dynamical system is considered deterministic if its transition function is a deterministic function of the input action and the previous state (i.e, given the same action and same previous state, the system goes to the same next state). It is considered non-deterministic if it is not deterministic (i.e., given an action and a previous state, the next state is not uniquely determined). The system is called Stochastic if its is non-deterministic, and the probability of transisition to the various next states is known. Note that non-deterministic dynamical systems are hard to control. Stochastic and deterministic ones can be controlled. Static vs. Dynamic: A dynamical system is said to be static if its transition fuction becomes "identity function" when the control input is said to "No op" (or more informally, the world doesn't change when the agent is not doing anything). A DS is said to be dynamic if it is not static. One can of course talk about a spectrum between static and dynamic -- in particular, the rate at which the system evolves. Observable vs. Non-observable: A dynamical system is said to be "completely observable" if its internal state can be recovered through the output transformation function. If only part of its state can be observed, then it is called "partially observable". (The control theoretic definition of observability actually considers a system to be observable if the state of the system at time t can be completely recovered through observations on the state of the system at a finite number of future time instances). CLASSES OF PLANNERS ------------------- On-line vs. Off-line Plan generation: The planning system is considered on line if the plan-generation component continuously uses the current observed state information in coming up with the plan. This may involve interleaving plan generation and plan execution. It is considered off-line if the plan generator takes a snapshot of the current state and obejctives, and generates a complete plan using just that information. All plan generation is thus done prior to any execution. Open-loop vs. Closed-Loop Plans: The planning system is considered Open-loop, if the executor does not use the observed state of the environment during execution. Otherwise, it is considered closed-loop. Plans vs. Policies: In deterministic and static environments, where the world evolves predictably, the agent can achieve an objective by just synthesizing a sequence of actions, which will make the environment transition through the desired sequence of states, culminating in the desired goal state. In the non-deterministic and dynamic environments, the world my transition into a state that is not expected by a single plan. Thus, ensuring that the agents objectives are satisfied my involve having "plans" that will take the agent from any given state towards the desired states. In otherwords, the agent needs to know what action/control input to give for any given state of the environment it finds itself in. Such a "all-state plan" is typically called a "policy". A policy is universal if it covers _all_ states of the environment (A universal policy is also called a universal plan). It is sometimes more feasible computationally to compute partial policies -- those that tell the agent what to do for only a subset of the states of the environment. If you are into computing partial policies, a natural question is "what is the subset of states" that you should cover. The natural answer is to cover those states that have high prior probabilities. For example, it is possible that even a dynamic stochastic environment is more likely to get into some wierd states, and less likely to get into other wierd states. So, the policy construction should progress by covering the more likely wierd states first. One way of doing this could be to construct a linear plan that goes through most likely states, and then consider stray transitions possible from those states. Planning vs. Control theory The discussion on dynamical systems would show that the work in planning and control theory should have a lot in common. Interestingly, the fields have hither-to been largely separate. Here is a simplistic picture. Work in planning typically concentrates on dynamical systems where the "next action" is not obvious givne the current state and the objectives. Much of the research is on "generating" plans of action, given elaborate models of the environment, and complex and changing objectives of an agent. Example: What should I do now so that a particular set of goals will be accomplished at some point in the future. Much of the early work disregarded the issues of noise in sensing, loss of state due to observations etc. In contrast, control theory concentrated on D.S.s where the next action is more or less obvious given the complete state of the environment. Example: How do I control the steering wheel of a car, given information about the state of the surrounding environemnt. The action selection is typically easier in that if youare straying to left you steer the car right and vice versa. Through the feedback from enviroment, the planning/control system decides on the magnintude of steering etc. There is typically no long-winded reasoning involved in action selection, and all the work is on finding the right state-action mapping through feedback. Since the fed-back state is critical in deciding the course of action, a lot of work has been done on controlling partially observable systems. Typically this is done by RECOVERING the state lost due to noise. The celebrated Kalman Filter provides a way of recovering state in a certain class of dynamical systems where the observed state is noisy. As our previous discussion shows, the distinction between planning and control are really artifacts of the specific problems that the researchers in the two areas chose to work on. Real planning problems do require handling of noise, state-recovery, sensor management etc., just as complex control problems will require sophisticated reasoning to select actions. ----Where are we in the big world? We will start by considering classical planning problem, where we consider dynamical sytems that have DETERMINISTIC, STATIC and COMPLETELY OBSERVABLE environments. Our planning systems will be OFF-LINE and OPEN-LOOP. As the semester progresses, we will have the opportunity to consider plan generation in other types of dynamical systems. -----