CSE571: MDP homework and project

PART 1:

1) Consider the 4x3 environment in Figure 17.1 in the AIMA textbook. Assume that we are only interested in decision making with K=2 steps. Perform value iteration to find the values of non-terminal states. What are the optimal move at each state?

2) Do the exercises 17.4, 17.8, 17.9 and 17.10 in the textbook (see the attachment).

PART 2:

Implement the value iteration, policy iteration and modified policy iteration algorithms for infinite horizon MDP with discounted rewards (the pseudo code can be found in Section 17.2 and 17.3 in the text book).

Use your code to verify your answer for exercise 17.8.

Test your code with the world in Figure 17.1 and report on your observations on the followings:

- For value iteration, how do value of states change wrt the number of iterations (you can pick one or two states)? Does this depend on the maximum allowed error of the state values?

- Compare the three algorithms in terms of running time to reach the optimal policy.

Note that you will use your implementation for Reinforcement Learning project, so make it as flexible as you can. No constraints on languages.

Due date: 09/19 (this can be changed depending on how slow the lecture is going in class)