Example: barber
Search results with tag "Dif ference learning"
Deep Reinforcement Learning with Double Q-learning - arXiv
arxiv.orgusing Q-learning (Watkins, 1989), a form of temporal dif-ference learning (Sutton, 1988). Most interesting problems are too large to learn all action values in all states sepa-rately. Instead, we can learn a parameterized value function Q(s;a; t). The standard Q-learning update for the param-eters after taking action At in state St and ...