Example: air traffic controller
Search results with tag "Gradient methods"
Learning Structured Representation for Text Classification ...
www.microsoft.comgradient methods (Sutton et al. 2000), aiming to maximize the expected reward as shown below. J() = E (s t;a t)˘P (s t;a t)r(s 1a 1 s La L) = X s 1a 1 s La L P (s 1a 1 s La L)R L = X s 1a 1 s La L p(s 1) Y t ˇ (a tjs t)p(s t+1js t;a t)R L = X s 1a 1 s La L Y t ˇ (a tjs t)R L: Note that this reward is computed over just one sample, say X= x ...