[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

CSE 571: Grading for Reinforcement Learning project



Hi all:
Here is some information on the Reinforcement Learning project: question 1 simply asked for the implementation, so I didn't grade it. Each of Q2 and Q3 has 20 points, and the optional Q4 has 10 points (it will be kept and used separately later). 

For Q2, since you may use different values for learning rate and discounted factors (and some students vary N_e as well), plus implementation may be different in the way you randomly restart a new state, I think it is reasonable to see different answers for the number of trials the agent need to converge. That said I didn't grade this with a fixed criterion in mind, but relying on:
If your report is two far from this, for instance having utilities of many states "converge" at negative values, or optimal actions at easy states such as (2,1), (3,1) are wrong, I subtracted some points from this part (it would be worse if best actions for (1:3,3) states are wrong). I didn't subtract anything if you're just wrong on the two states (4,1) and (3,2) (please let me know if I did).

For Q3, the shape of your graph should be as I described, and from that it is obvious that ADP is faster than Q-learning (in terms of utility convergence). There're student(s) reporting opposite results but still commenting that ADP is faster, and I had to assume that your implementation of Q-learning had issues...

For Q4, I saw various suggestion for different exploration functions. If you could explain the intuition behind yours, and you had experiment results consistent with that, you got full credits. I gave partial credits for those who didn't test their new functions.

Here is some statistics (for Q2 + Q3):
Max: 40
Min: 15
Avg: 32.7
Stdev: 8.4

Please let me know if you have any questions.
Tuan