Deterministic Policy Gradient Algorithms
Deterministic Policy Gradient AlgorithmsDavid Technologies, London, UKGuy College London, UKNicolas Heess, Thomas Degris, Daan Wierstra, Martin Technologies, London, UKAbstractIn this paper we considerdeterministicpolicygradient Algorithms for reinforcement learningwith continuous actions. The Deterministic pol-icy Gradient has a particularly appealing form: itis the expected Gradient of the action-value func-tion. This simple form means that the deter-ministic Policy Gradient can be estimated muchmore efficiently than the usual stochastic pol-icy Gradient . To ensure adequate exploration,we introduce an off- Policy actor-critic algorithmthat learns a Deterministic target Policy from anexploratory behaviour Policy . We demonstratethat Deterministic Policy Gradient Algorithms cansignificantly outperform their stochastic counter-parts in high-dimensional action IntroductionPolicy Gradient Algorithms are widely used in reinforce-ment learning problems with continuous action spaces.
Deterministic Policy Gradient Algorithms David Silver DAVID@DEEPMIND.COM DeepMind Technologies, London, UK Guy Lever GUY.LEVER@UCL.AC.UK University College London, UK Nicolas Heess, Thomas Degris, Daan Wierstra, Martin Riedmiller *@DEEPMIND.COM
Download Deterministic Policy Gradient Algorithms
Information
Domain:
Source:
Link to this page:
Please notify us if you found a problem with this document: