[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

CSE571: MDP homework and project



Hello:
We add one more exercise for the finite horizon MDP part, and also some guidelines for the implementation part. If you have any questions, please let us know (preferably posted on the class forum so people can also read).
Tuan

---

PART 1:

1) Consider the 4x3 environment in Figure 17.1 in the AIMA textbook. Assume that we are only interested in decision making with K=2 steps. Perform value iteration to find the values of non-terminal states. What are the optimal move at each state?

2) Do the exercises 17.4, 17.8, 17.9 and 17.10 in the textbook (see the attachment). 

PART 2:

Implement the value iteration, policy iteration and modified policy iteration algorithms for infinite horizon MDP with discounted rewards (the pseudo code can be found in Section 17.2 and 17.3 in the text book).

Use your code to verify your answer for exercise 17.8.

Test your code with the world in Figure 17.1 and report on your observations on the followings:
- For value iteration, how do value of states change wrt the number of iterations (you can pick one or two states)? Does this depend on the maximum allowed error of the state values?
- Compare the three algorithms in terms of running time to reach the optimal policy.

Note that you will use your implementation for Reinforcement Learning project, so make it as flexible as you can. No constraints on languages.

Due date: 09/19 (this can be changed depending on how slow the lecture is going in class)

---

Attachment: MDP Homeworks.pdf
Description: Adobe PDF document