Transcription of Machine Learning 1: Linear Regression
1 Machine Learning 1: Linear RegressionStefano ErmonMarch 31, 2016 Stefano ErmonMachine Learning 1: Linear RegressionMarch 31, 20161 / 25 Plan for todayPlan for today:Supervised Machine Learning : Linear regressionStefano ErmonMachine Learning 1: Linear RegressionMarch 31, 20162 / 25 Renewable electricity generation in the : Renewable energy data book, NRELS tefano ErmonMachine Learning 1: Linear RegressionMarch 31, 20163 / 25 Challenges for the gridWind and solar are intermittentWe will need traditional power plants when the wind stopsMany power plants ( , nuclear) cannot be easily turned on/off orquickly ramped up/downWith more accurate forecasts, wind and solar power become moreefficient alternativesA few years ago, Xcel Energy (Colorado) ran ads opposing a proposalthat it use 10% of renewable sourcesThanks to wind forecasting (ML) algorithms developed at NCAR, theynow aim for 30 percent.
2 Accurate forecasting saved the utility $6-$10million per yearStefano ErmonMachine Learning 1: Linear RegressionMarch 31, 20164 / 25 MotivationSolar and wind are intermittentCan we accurately forecast how much energy will we consumetomorrow?Difficult to estimate from a priori modelsBut, we have lots of data from which to build a modelStefano ErmonMachine Learning 1: Linear RegressionMarch 31, 20165 / 25 Typical electricity of DayHourly Demand (GW) Feb 9 Jul 13 Oct 10 Data: ErmonMachine Learning 1: Linear RegressionMarch 31, 20166 / 25 Predict peak demand from high temperatureWhat will peak demand be tomorrow?If we know something else about tomorrow (like the high temperature),we can use this topredictpeak Temperature (F)Peak Hourly Demand (GW)Data: PJM, Weather Underground (summer months, June-August)Stefano ErmonMachine Learning 1: Linear RegressionMarch 31, 20167 / 25A simple modelA Linear model that predicts demand:predicted peak demand= 1 (high temperature) + Temperature (F)Peak Hourly Demand (GW) Observed dataLinear Regression predictionParametersof model: 1, 2 R( 1= , 2= )Stefano ErmonMachine Learning 1: Linear RegressionMarch 31, 20168 / 25A simple modelWe can use a model like this to make predictionsWhat will be the peak demand tomorrow?
3 I know from weather report that high temperature will be80 F (ignore,for the moment, that this too is a prediction)Then predicted peak demand is: 1 80 + 2= 80 = ErmonMachine Learning 1: Linear RegressionMarch 31, 20169 / 25 Formal problem settingInput:xi Rn, i= 1,.., :xi R1={high temperature for dayi}Output:yi R(regressiontask) :yi R={peak demand for dayi}Model Parameters: RkPredicted Output: yi : yi= 1 xi+ 2 Stefano ErmonMachine Learning 1: Linear RegressionMarch 31, 201610 / 25 For convenience, we define a function that maps inputs tofeaturevectors :Rn RkFor example, in our task above, if we define (xi) =[xi1](heren= 1,k= 2)then we can write yi=k j=1 j j(xi) T (xi)Stefano ErmonMachine Learning 1: Linear RegressionMarch 31, 201611 / 25 Loss functionsWant a model that performs well on the data we , yi yi, iWe measure closeness of yiandyiusingloss function`:R R R+Example: squared loss`( yi,yi) = ( yi yi)2 Stefano ErmonMachine Learning 1.
4 Linear RegressionMarch 31, 201612 / 25 Finding model parameters, and optimizationWant to find model parameters such that minimize sum of costs over allinput/output pairsJ( ) =m i=1`( yi,yi) =m i=1( T (xi) yi)2 Write our objective formally asminimize J( )simple example of anoptimization problem; these will dominate ourdevelopment of algorithms throughout the courseStefano ErmonMachine Learning 1: Linear RegressionMarch 31, 201613 / 25 How do we optimize a functionSearch algorithm: Start with an initial guess for . Keep changing (bya little bit) to reduceJ( ) ErmonMachine Learning 1: Linear RegressionMarch 31, 201614 / 25 Gradient descentSearch algorithm: Start with an initial guess for . Keep changing (bya little bit) to reduceJ( )J( ) =m i=1`( yi,yi) =m i=1( T (xi) yi)2 Gradient descent: j= j J( ) j, for allj J j= mi=1( T (xi) yi)2 j=m i=1 ( T (xi) yi)2 j=m i=12( T (xi) yi) ( T (xi) yi) j=m i=12( T (xi) yi) (xi)jStefano ErmonMachine Learning 1: Linear RegressionMarch 31, 201615 / 25 Gradient descentRepeat until convergence : j= j m i=12( T (xi) yi) (xi)j,for alljDemo: gradient descentStefano ErmonMachine Learning 1: Linear RegressionMarch 31, 201616 / 25 Let s writeJ( )a little more compactly using matrix notation; define Rm k= (x1)T (x2)T.
5 (xm)T , y Rm= thenJ( ) =m i=1( T (xi) yi)2= y 22( z 2is`2norm of a vector: z 2 mi=1z2i= zTz)Calledleast-squaresobjective functionStefano ErmonMachine Learning 1: Linear RegressionMarch 31, 201617 / 25 How do we optimize a function? 1-D case ( R): 3 2 10 1 2 3 20 2 4 6 8 101214 3 2 10 1 2 3 20 2 4 6 8 101214 3 2 10 1 2 3 8 6 4 20 2 4 3 2 10 1 2 3 8 6 4 20 2 4 J( ) = 2 2 1dJd = 2 2 ?minimum= dJd ?= 0= 2 ? 2 = 0= ?= 1 Stefano ErmonMachine Learning 1: Linear RegressionMarch 31, 201618 / 25 Multi-variate case: Rk,J:Rk RGeneralized condition: J( )| ?= 0 J( )denotesgradientofJwith respect to J( ) Rk J 1 J J k Some important rules and common gradient (af( ) +bg( )) =a f( ) +b g( ),(a,b R) ( TA ) = (A+AT) ,(A Rk k) (bT ) =b,(b Rk)Stefano ErmonMachine Learning 1: Linear RegressionMarch 31, 201619 / 25 Optimizing least-squares objectiveJ( ) = y 22= ( y)T( y)= T T 2yT +yTyUsing the previous gradient rules J( ) = ( T T 2yT +yTy)= ( T T ) 2 (yT ) + (yTy)= 2 T 2 TySetting gradient equal to zero2 T ?
6 2 Ty= 0 ?= ( T ) 1 Tyknown as thenormal equationsStefano ErmonMachine Learning 1: Linear RegressionMarch 31, 201620 / 25 Let s see how this looks in MATLAB codeX = load( );y = load( );n = size(X,2);m = size(X,1);Phi = [X ones(m,1)];theta = inv(Phi * Phi) * Phi * y;theta = normal equations are so common that MATLAB has a specialoperation for them% same as inv(Phi * Phi) * Phi * ytheta = Phi\y;Stefano ErmonMachine Learning 1: Linear RegressionMarch 31, 201621 / 25 Higher-dimensional inputsInput:x R2=[temperaturehour of day]Output:y R=demandStefano ErmonMachine Learning 1: Linear RegressionMarch 31, 201622 / 25 Stefano ErmonMachine Learning 1: Linear RegressionMarch 31, 201623 / 25 Features: (x) R3= temperaturehour of day1 Same matrices as before Rm k= (x1)T .. (xm)T , y Rm= Same solution as before R3= ( T ) 1 TyStefano ErmonMachine Learning 1: Linear RegressionMarch 31, 201624 / 25 Stefano ErmonMachine Learning 1: Linear RegressionMarch 31, 201625 / 25