Transcription of Asynchronous Methods for Deep Reinforcement Learning
{{id}} {{{paragraph}}}
Asynchronous Methods for Deep Reinforcement LearningVolodymyr Puigdom nech P. DeepMind2 Montreal Institute for Learning Algorithms (MILA), University of MontrealAbstractWeproposeaconceptuallysi mpleandlightweight framework for deep reinforce-ment Learning that uses Asynchronous gradientdescent for optimization of deep neural networkcontrollers. We present Asynchronous variants offour standard Reinforcement Learning algorithmsand show that parallel actor-learners have astabilizing effect on training allowing all fourmethods to successfully train neural best performing method, anasynchronous variant of actor-critic, surpassesthe current state-of-the-art on the Atari domainwhile training for half the time on a singlemulti-core CPU instead of a GPU.
The process continues until the agent reaches a terminal state after which the process restarts. The return R t = P 1 k=0 kr t+k is the total accumulated return from time step twith discount factor 2(0;1]. The goal of the agent is to maximize the expected return from each state s t. The action value Qˇ(s;a) = E[R tjs t= s;a] is the ex-
Domain:
Source:
Link to this page:
Please notify us if you found a problem with this document:
{{id}} {{{paragraph}}}