Example: confidence

Search results with tag "The parameters"

arXiv:1911.08265v2 [cs.LG] 21 Feb 2020

arXiv:1911.08265v2 [cs.LG] 21 Feb 2020

arxiv.org

The parameters of the representation, dynamics and prediction functions are jointly trained, end-to-end by backpropagation-through-time, to predict three quantities: the policy pk ˇˇ t+k, value function v k ˇz t+k, and reward r t+k ˇu t+k, where z t+kis a sample return: either the final reward (board games) or n-step return (Atari).

  Parameters, The parameters

Similar queries