PDF4PRO ⚡AMP

Modern search engine that looking for books and documents around the web

Example: barber

Deterministic Policy Gradient Algorithms

Deterministic Policy Gradient AlgorithmsDavid Technologies, London, UKGuy College London, UKNicolas Heess, Thomas Degris, Daan Wierstra, Martin Technologies, London, UKAbstractIn this paper we considerdeterministicpolicygradient Algorithms for reinforcement learningwith continuous actions. The Deterministic pol-icy Gradient has a particularly appealing form: itis the expected Gradient of the action-value func-tion. This simple form means that the deter-ministic Policy Gradient can be estimated muchmore efficiently than the usual stochastic pol-icy Gradient . To ensure adequate exploration,we introduce an off- Policy actor-critic algorithmthat learns a Deterministic target Policy from anexploratory behaviour Policy .

deterministic policy gradient does indeed exist, and further-more it has a simple model-free form that simply follows the gradient of the action-value function. In addition, we show that the deterministic policy gradient is the limiting Proceedings of the 31st International Conference on Machine Learning, Beijing, China, 2014. JMLR: W&CP volume ...

Loading..

Tags:

  Deterministic

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Spam in document Broken preview Other abuse

Transcription of Deterministic Policy Gradient Algorithms

Related search queries