Example: air traffic controller

Search results with tag "Gradient methods"

Learning Structured Representation for Text Classification ...

Learning Structured Representation for Text Classification ...

www.microsoft.com

gradient methods (Sutton et al. 2000), aiming to maximize the expected reward as shown below. J() = E (s t;a t)˘P (s t;a t)r(s 1a 1 s La L) = X s 1a 1 s La L P (s 1a 1 s La L)R L = X s 1a 1 s La L p(s 1) Y t ˇ (a tjs t)p(s t+1js t;a t)R L = X s 1a 1 s La L Y t ˇ (a tjs t)R L: Note that this reward is computed over just one sample, say X= x ...

  Methods, Learning, Derating, Gradient methods

Similar queries