[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

on- vs. off- policy



I used the terms on-policy and off-policy wrongly in the class. 

An on-policy agent actually follows the policy and computes the values of the policy being computed. An off-policy agent gets experience through the policy being followed, but computes the values with bellman equations (and thus learns the optimal values even when being guided by an inferior policy). 

Thus, SARSA is on-policy while Q-learning that uses best Q-value is  off-policy. 

I modified the corresponding slide to make this clear.

sorry for the confusion
rao