[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

some More on MDPs...

To: cse471-f03@parichaalak.eas.asu.edu
Subject: some More on MDPs...
From: Subbarao Kambhampati <rao@parichaalak.eas.asu.edu>
Date: Tue, 23 Sep 2003 20:43:56 -0700
Reply-to: rao@asu.edu

Here are some more factoids on MDPS..

Terminology:
 
 The "Value" function is also called "Cost-to-go" function in some
contexts. 

 The "horizon" is also called "stage" (finite horizon MDP is called
finite stage MDP)


=========
Doing value iteration for finite-horizon MDPs

If you have a finite-horizon (stage) MDP with K stages then, 
you will have essentially K+1 different value functions.

U_0 (which is the value you will get if you can't do any more actions)

U_1 (which is the value you will get if you can do at most one action)

...
U_K (which is the value you will get if you can do at most K actions)


Now, it is clear that the Bellman equation is written in term of U_m
and  U_m-1


U_m(i)  = R(i) + max_over_a SUM M_i,j U_m-1(j)   (Eqn I)

We know U_0 for all states (this is just the immediate reward of the
state).


So, using U_0, we can compute U_1, using U_1, we can compute U_2
etc.  and finally U_K. 


Once we have all these value functions, then, we can compute optimal 
policies for each of the stages (using the MEU principle).

=============

Clearly, the infinite horizon MDP can be seen as a special case of
finite-horizon one, where you only have one stage-independent optimal 
value function. The U_m and U_m-1 in Eqn I will both be U*

========

Rao
[Sep 23, 2003]

Prev by Date: Possibility of using Java instead of Lisp on first project
Next by Date: Doing the pattern databases extra credit task..
Previous by thread: Possibility of using Java instead of Lisp on first project
Next by thread: Doing the pattern databases extra credit task..
Index(es):
- Date
- Thread