Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration
There has been growing progress on theoretical analyses for provably efficient learning in MDPs with linear function approximation, but much of the existing work has made strong assumptions…
Andrea Zanette, Alessandro Lazaric, Mykel J. Kochenderfer, Emma Brunskill,