Transcription of Deterministic Policy Gradient Algorithms
{{id}} {{{paragraph}}}
Deterministic Policy Gradient AlgorithmsDavid Technologies, London, UKGuy College London, UKNicolas Heess, Thomas Degris, Daan Wierstra, Martin Technologies, London, UKAbstractIn this paper we considerdeterministicpolicygradient Algorithms for reinforcement learningwith continuous actions. The Deterministic pol-icy Gradient has a particularly appealing form: itis the expected Gradient of the action-value func-tion. This simple form means that the deter-ministic Policy Gradient can be estimated muchmore efficiently than the usual stochastic pol-icy Gradient . To ensure adequate exploration,we introduce an off- Policy actor-critic algorithmthat learns a Deterministic target Policy from anexploratory behaviour Policy . We demonstratethat Deterministic Policy Gradient Algorithms cansignificantly outperform their stochastic counter-parts in high-dimensional action IntroductionPolicy Gradient Algorithms are widely used in reinforce-ment learning problems with continuous action spaces.
(ajs) = P[ajs; ] that stochastically selects action ain state saccording to parameter vector . Policy gradient algorithms typically proceed by sampling this stochastic policy and adjusting the policy parameters in the direction of greater cumulative reward.
Domain:
Source:
Link to this page:
Please notify us if you found a problem with this document:
{{id}} {{{paragraph}}}