PDF4PRO ⚡AMP

Modern search engine that looking for books and documents around the web

Example: bankruptcy

Policy Gradient Methods for Reinforcement Learning with ...

Advances in Neural Information Processing Systems 12, pp. 1057{1063, MIT Press, 2000 Policy Gradient Methods forReinforcement Learning with FunctionApproximationRichard S. Sutton, David McAllester, Satinder Singh, Yishay MansourAT&T Labs { Research, 180 Park Avenue, Florham Park, NJ 07932 AbstractFunction approximation is essential to Reinforcement Learning , butthe standard approach of approximating a value function and deter-mining a Policy from it has so far proven theoretically this paper we explore an alternative approach in which the policyis explicitly represented by its own function approximator, indepen-dent of the value function, and is updated according to the gradientof expected reward with respect to the Policy parameters.}}

Williams’s (1988, 1992) REINFORCE algorithm also flnds an unbiased estimate of the gradient, but without the assistance of a learned value function. REINFORCE learns much more slowly than RL methods using value functions and has received relatively little attention. Learning a value function and using it to reduce the variance

Tags:

  Methods, Learning, Reinforcement, Derating, Gradient methods for reinforcement learning

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Spam in document Broken preview Other abuse

Transcription of Policy Gradient Methods for Reinforcement Learning with ...

Related search queries