[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

LRTDP paper



is here http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.75.4460 (only the "cached copy" link works)


rao

ps: ask yourself this: How come lrtdp overwrites the current states's value with the q-value of its greedy action? Shouldn't it be doing something like taking the min of the current value and the q-value of the greedy action?