Example: marketing
Search results with tag "Doubleq"
DoubleQ-learning - NeurIPS
proceedings.neurips.cc1 Introduction Q-learning is a popular reinforcement learning algorithm that was proposed by Watkins [1] and can be used to optimally solve Markov Decision Processes (MDPs) [2]. We show that Q-learning’s performance can be poor in stochastic MDPs because of large overestimations of the action val-ues.