Example: bankruptcy
Search results with tag "Policy gradient methods"
Policy Gradient Methods for Reinforcement Learning with ...
proceedings.neurips.cclearns much more slowly than RL methods using value functions and has received relatively little attention. Learning a value function and using it to reduce the variance of the gradient estimate appears to be ess~ntial for rapid learning. Jaakkola, Singh