PDF4PRO ⚡AMP

Modern search engine that looking for books and documents around the web

Example: barber

Deterministic Policy Gradient Algorithms

Deterministic Policy Gradient AlgorithmsDavid Technologies, London, UKGuy College London, UKNicolas Heess, Thomas Degris, Daan Wierstra, Martin Technologies, London, UKAbstractIn this paper we considerdeterministicpolicygradient Algorithms for reinforcement learningwith continuous actions. The Deterministic pol-icy Gradient has a particularly appealing form: itis the expected Gradient of the action-value func-tion. This simple form means that the deter-ministic Policy Gradient can be estimated muchmore efficiently than the usual stochastic pol-icy Gradient . To ensure adequate exploration,we introduce an off- Policy actor-critic algorithmthat learns a Deterministic target Policy from anexploratory behaviour Policy . We demonstratethat Deterministic Policy Gradient Algorithms cansignificantly outperform their stochastic counter-parts in high-dimensional action IntroductionPolicy Gradient Algorithms are widely used in reinforce-ment learning problems with continuous action spaces.

(ajs) = P[ajs; ] that stochastically selects action ain state saccording to parameter vector . Policy gradient algorithms typically proceed by sampling this stochastic policy and adjusting the policy parameters in the direction of greater cumulative reward.

Loading..

Tags:

  Deterministic

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Spam in document Broken preview Other abuse

Transcription of Deterministic Policy Gradient Algorithms

Related search queries