Transcription of Deterministic Policy Gradient Algorithms
{{id}} {{{paragraph}}}
Deterministic Policy Gradient AlgorithmsDavid Technologies, London, UKGuy College London, UKNicolas Heess, Thomas Degris, Daan Wierstra, Martin Technologies, London, UKAbstractIn this paper we considerdeterministicpolicygradient Algorithms for reinforcement learningwith continuous actions. The Deterministic pol-icy Gradient has a particularly appealing form: itis the expected Gradient of the action-value func-tion. This simple form means that the deter-ministic Policy Gradient can be estimated muchmore efficiently than the usual stochastic pol-icy Gradient . To ensure adequate exploration,we introduce an off- Policy actor-critic algorithmthat learns a Deterministic target Policy from anexploratory behaviour Policy .
deterministic policy gradient does indeed exist, and further-more it has a simple model-free form that simply follows the gradient of the action-value function. In addition, we show that the deterministic policy gradient is the limiting Proceedings of the 31st International Conference on Machine Learning, Beijing, China, 2014. JMLR: W&CP volume ...
Domain:
Source:
Link to this page:
Please notify us if you found a problem with this document:
{{id}} {{{paragraph}}}