Transcription of Generative Adversarial Imitation Learning
{{id}} {{{paragraph}}}
Generative Adversarial Imitation LearningJonathan ErmonStanford Learning a policy from example expert behavior, without interaction withthe expert or access to a reinforcement signal. One approach is to recover theexpert s cost function with inverse reinforcement Learning , then extract a policyfrom that cost function with reinforcement Learning . This approach is indirectand can be slow. We propose a new general framework for directly extracting apolicy from data as if it were obtained by reinforcement Learning following inversereinforcement Learning . We show that a certain instantiation of our frameworkdraws an analogy between Imitation Learning and Generative Adversarial networks,from which we derive a model-free Imitation Learning algorithm that obtains signif-icant performance gains over existing model-free methods in imitating complexbehaviors in large, high-dimensional IntroductionWe are interested in a specific setting of Imitation Learning the problem of Learning to perform atask from expert demonstrations in which the learner is giv
the basic result that the set of valid occupancy measures D, fˆ ˇ: ˇ2 gcan be written as a feasible set of affine constraints [19]: if p 0(s)is the distribution of starting states and P(s0js;a)is the dynamics model, then D= n ˆ: ˆ 0 and P a ˆ(s;a) = p 0(s) + P s0;a P(sjs 0;a)ˆ(s0;a) 8s2S o. Furthermore, there is a one-to-one ...
Domain:
Source:
Link to this page:
Please notify us if you found a problem with this document:
{{id}} {{{paragraph}}}