Attached are the slides that I’ll use today.
I’ll mostly talk about the POMDP model, how to represent the value function, and how to do value iteration. For those interested, you may want to check out Hansen’s UAI98 paper on policy iteration, it is quite good and generalizes Sondik’s original policy iteration algorithm.
At the end we’ll talk about POSGs (multi-agent pomdp). They (Hansen et.al., AAAI04) have some interesting results for 2 agents, 2 actions, and 2 observations – they can do 4 steps of dynamic programming within 2GB of memory!
Description: GNU Zip compressed data