Linear Regression via Maximization of the Likelihood

ElementsofMachineLearningPrincetonUniver sityInleastsquaresregression, ,weintroducedtheideaofafunction ( y,y)thatisbiggerwhenourmachinelearningmo delproducesanestimate ,weusedasquaredloss: ( y,y)=( y y)2.(1)Inthisnotewe , {yn}Nn=1whereyn Randweassumethattheyareallindependentlya ndidenticallydistributedaccordingtoaGaus siandistributionwithunknownmean andvariance | , 2 N(yn| , 2).(2)Theprobabilitydensityfunctionassoc iatewiththisconditionaldistributionisthe familiarunivariateGaussian:Pr(yn| , 2)=1 2 exp 12 2(yn )2 .(3) ,however,andsowewritetheconditionaldistr ibutionforallofthemasaproduct:Pr({yn}Nn= 1| , 2)=N n=11 2 exp 12 2(yn )2 .(4)Thisfunction,whichweareherethinkingo fasbeingparameterizedby ,weareasking what wouldassignthehighestprobabilitytothedat awe veseen? Thisinductivecriterionofselectingmodelpa rametersbasedontheirabilitytoprobabilist icallyexplainthedataiswhatwerefertoasmax imumlikelihoodestimation(MLE).

,however,wecanthinkofthisasbeingadiffere nt(negative)lossfunction: = MLE=argmax Pr({yn}Nn=1| , 2)=argmax N n=11 2 exp 12 2(yn )2 .(5)Inpractice,thisisn ,weactuallyprefertomaximizetheloglikelih oodbecauseitturnsallofourproductsintosum s,whichareeasiertomanipulateanddifferent iate, ,whenwetaketheproductofmanythingsthatmay belessthan1,thefloatingpointnumbersonour computermaybecomeveryclosetozeroandthema ximizationmaynotbenumericallystable; ,our(negative)lossfunctionbecomesL( )=logPr({xn}Nn=1| , 2)=N n=1log 1 2 exp 12 2(yn )2 (6)= Nlog N2log2 12 2N n=1(yn )2.(7)Figure1showsthelikelihoodfunctionL ( ) ; byfollowingthesamekindofprocedurethatweu sedforleastsquaresregression:differentia te,settozero,andsolvefor :dd L( )=1 2N n=1(yn )=0(8)1 2N n=1yn N 2 =0(9) =1NN n=1yn.(10)2 4 10 7 Figure1:Theblackdotsareten(N=10)datafrom aGaussiandistributionwith 2=1and = . ; ,themaximumlikelihoodestimateinthismodel (regardlessof 2) ,weassumeourdataaretuplesoftheform{xn,yn }Nn=1,wherexn RDandyn ,however, ,we renowgoingtosaythatourdataarisefromaproc esslikey=xTw+ (11)where ,wecanthinkaboutdifferentnoisemodelsfor.

Generalizingtheprevioussection,averynatu ralideaistosaythatthisnoiseisfromazero-m eanGaussiandistributionwithvariance 2, , | 2 N( |0, 2).(12)AddingaconstanttoaGaussianjusthas theeffectofshiftingitsmean,sotheresultin gconditionalprobabilitydistributionforou rgenerativeprobabilisticprocessisPr(yn|x n,w, 2)=1 2 exp 12 2(yn xTnw)2 .(13) ,exceptnowratherthanconditioningon ,we :Pr({yn}Nn=1|{xn}Nn=1,w, 2)=N n=11 2 exp 12 2(yn xTnw)2 .(14)Again,thisisafunctionofw, , ,butwithanewtwist:we (z| , )=| | 1/2(2 ) D/2exp 12(z )T 1(z ) .(15)Thecovariancematrix mustbesquare,symmetric, isdiagonal, :Pr(y|X,w, 2)=N(y|Xw, 2I)=(2 2 ) N/2exp 12 2(Xw y)T(Xw y) .(16)Wecannowthinkabouthowwe ,itishelpfultotakethenaturallogfirst:log Pr(y|X,w, 2)= N2log(2 2 ) 12 2(Xw y)T(Xw y).(17)Theadditivetermdoesn :wMLE=argmaxw 12 2(Xw y)T(Xw y) .(18)4 The12 2doesnotchangethesolutiontothisprobleman dofcoursecouldchangethesignandmakethisma ximizationintoaminimization:wMLE=argminw (Xw y)T(Xw y).

(19)Thisisexactlythesameoptimizationprob lemthatwesolvedfortheleast-squareslinear Regression !Whileitseemslikethelossfuncti onviewandthemaximumlikelihoodviewarediff erent,thisrevealsthattheyareoftenthesame underthehood:leastsquarescanbeinterprete dasassumingGaussiannoise,andparticularch oicesoflikelihoodcanbeinterpreteddirectl yas(usuallyexponentiated) 2 Onethingthatisdifferentaboutmaximumlikel ihood,however, , , ,afterfindingwMLEifwehaveaqueryinputxpre dforwhichwedon tknowthey,wecouldcomputeaguessviaypred=x TpredwMLE,orwecouldactuallyconstructawho ledistribution:Pr(ypred|xpred,wMLE, 2)=N(ypred|xTpredwMLE, 2).(20)Thissoundsgreat,but ,itwon tbeanygood ,maximumlikelihoodestimationtellsushowto dothatonealso,andwecanstartoutbyassuming thatwe : MLE=argmax N2log(2 2 ) 12 2(XwMLE y)T(XwMLE y) .(21)Solvingthismaximizationproblemisaga injustaquestionofdifferentiatingandsetti ngtozero: 2 N2log 2 12 2(XwMLE y)T(XwMLE y) =0(22) N2 2+12 4(XwMLE y)T(XwMLE y)=0(23) N+1 2(XwMLE y)T(XwMLE y)=0(24) 2=1N(XwMLE y)T(XwMLE y).

(25) 17 September2018 Initialversion6

Linear Regression via Maximization of the Likelihood

Tags:

Information

Transcription of Linear Regression via Maximization of the Likelihood

Related search queries

Linear Regression via Maximization of the Likelihood

Tags:

Information

Documents from same domain

Related documents

Related search queries