Example: barber

Logistic Regression - Pennsylvania State University

Logistic RegressionLogistic RegressionJia LiDepartment of StatisticsThe Pennsylvania State UniversityEmail: jialiJia Li jialiLogistic RegressionLogistic RegressionPreserve linear classification the Bayes rule: G(x) = arg maxkPr(G=k|X=x).IDecision boundary between classkandlis determined by theequation:Pr(G=k|X=x) =Pr(G=l|X=x).IDivide both sides byPr(G=l|X=x) and take log. Theabove equation is equivalent tologPr(G=k|X=x)Pr(G=l|X=x)= Li jialiLogistic RegressionISince we enforce linear boundary, we can assumelogPr(G=k|X=x)Pr(G=l|X=x)=a(k,l)0+ p j=1a(k,l) Logistic Regression , there are restrictive relations betweena(k,l)for different pairs of (k,l).Jia Li jialiLogistic RegressionAssumptionslogPr(G= 1|X=x)Pr(G=K|X=x)= 10+ T1xlogPr(G= 2|X=x)Pr(G=K|X=x)= 20+ (G=K 1|X=x)Pr(G=K|X=x)= (K 1)0+ TK 1xJia Li jialiLogistic RegressionIFor any pair (k,l):logPr(G=k|X=x)Pr(G=l|X=x)= k0 l0+ ( k l) of parameters : (K 1)(p+ 1).

Logistic Regression Fitting Logistic Regression Models I Criteria: find parameters that maximize the conditional likelihood of G given X using the training data. I Denote p k(x i;θ) = Pr(G = k |X = x i;θ). I Given the first input x 1, the posterior probability of its class being g 1 is Pr(G = g 1 |X = x 1). I Since samples in the training data set are independent, the

Tags:

  Logistics, Regression, Parameters, Logistic regression

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Logistic Regression - Pennsylvania State University

1 Logistic RegressionLogistic RegressionJia LiDepartment of StatisticsThe Pennsylvania State UniversityEmail: jialiJia Li jialiLogistic RegressionLogistic RegressionPreserve linear classification the Bayes rule: G(x) = arg maxkPr(G=k|X=x).IDecision boundary between classkandlis determined by theequation:Pr(G=k|X=x) =Pr(G=l|X=x).IDivide both sides byPr(G=l|X=x) and take log. Theabove equation is equivalent tologPr(G=k|X=x)Pr(G=l|X=x)= Li jialiLogistic RegressionISince we enforce linear boundary, we can assumelogPr(G=k|X=x)Pr(G=l|X=x)=a(k,l)0+ p j=1a(k,l) Logistic Regression , there are restrictive relations betweena(k,l)for different pairs of (k,l).Jia Li jialiLogistic RegressionAssumptionslogPr(G= 1|X=x)Pr(G=K|X=x)= 10+ T1xlogPr(G= 2|X=x)Pr(G=K|X=x)= 20+ (G=K 1|X=x)Pr(G=K|X=x)= (K 1)0+ TK 1xJia Li jialiLogistic RegressionIFor any pair (k,l):logPr(G=k|X=x)Pr(G=l|X=x)= k0 l0+ ( k l) of parameters : (K 1)(p+ 1).

2 IDenote the entire parameter set by ={ 10, 1, 20, 2,.., (K 1)0, K 1}.IThe log ratio of posterior probabilities are calledlog-oddsorlogit Li jialiLogistic RegressionIUnder the assumptions, the posterior probabilities are givenby:Pr(G=k|X=x) =exp( k0+ Tkx)1 + K 1l=1exp( l0+ Tlx)fork= 1,..,K 1Pr(G=K|X=x) =11 + K 1l=1exp( l0+ Tlx).IForPr(G=k|X=x) given above, obviouslyISum up to 1: Kk=1Pr(G=k|X=x) = simple calculation shows that the assumptions are Li jialiLogistic RegressionComparison with LR on IndicatorsISimilarities:IBoth attempt to estimatePr(G=k|X=x).IBoth have linear classification :ILinear Regression on indicator matrix: approximatePr(G=k|X=x) by a linear function (G=k|X=x) is not guaranteed to fall between 0 and 1and to sum up to Regression :Pr(G=k|X=x) is anonlinearfunctionofx.

3 It is guaranteed to range from 0 to 1 and to sum up to Li jialiLogistic RegressionFitting Logistic Regression ModelsICriteria: find parameters that maximize the conditionallikelihood ofGgivenXusing the training (xi; ) =Pr(G=k|X=xi; ).IGiven the first inputx1, the posterior probability of its classbeingg1isPr(G=g1|X=x1).ISince samples in the training data set are independent, theposterior probability for theNsamples each having classgi,i= 1,2,..,N, given their inputsx1,x2, ..,xNis:N i=1Pr(G=gi|X=xi).Jia Li jialiLogistic RegressionIThe conditional log-likelihood of the class labels in thetraining data set isL( ) =N i=1logPr(G=gi|X=xi)=N i=1logpgi(xi; ).Jia Li jialiLogistic RegressionBinary ClassificationIFor binary classification, ifgi= 1, denoteyi= 1; ifgi= 2,denoteyi= (x; ) =p(x; ), thenp2(x; ) = 1 p1(x; ) = 1 p(x; ).

4 ISinceK= 2, the parameters ={ 10, 1}.We denote = ( 10, 1) Li jialiLogistic RegressionIIfyi= 1, ,gi= 1,logpgi(x; ) = logp1(x; )= 1 logp(x; )=yilogp(x; ).Ifyi= 0, ,gi= 2,logpgi(x; ) = logp2(x; )= 1 log(1 p(x; ))= (1 yi) log(1 p(x; )).Since eitheryi= 0 or 1 yi= 0, we havelogpgi(x; ) =yilogp(x; ) + (1 yi) log(1 p(x; )).Jia Li jialiLogistic RegressionIThe conditional likelihoodL( ) =N i=1logpgi(xi; )=N i=1[yilogp(xi; ) + (1 yi) log(1 p(xi; ))]ITherep+ 1 parameters in = ( 10, 1) a column vector form for : = 10 11 1,p .Jia Li jialiLogistic RegressionIHere we add the constant term 1 toxto accommodate 1x,1x, ,p .Jia Li jialiLogistic RegressionIBy the assumption of Logistic Regression model:p(x; ) =Pr(G= 1|X=x) =exp( Tx)1 + exp( Tx)1 p(x; ) =Pr(G= 2|X=x) =11 + exp( Tx)ISubstitute the above inL( ):L( ) =N i=1[yi Txi log(1 +e Txi)].

5 Jia Li jialiLogistic RegressionITo maximizeL( ), we set the first order partial derivatives ofL( ) to zero. L( ) 1j=N i=1yixij N i=1xije Txi1 +e Txi=N i=1yixij N i=1p(x; )xij=N i=1xij(yi p(xi; ))for allj= 0,1,.., Li jialiLogistic RegressionIIn matrix form, we write L( ) =N i=1xi(yi p(xi; )).ITo solve the set ofp+ 1 nonlinear equations L( ) 1j= 0,j= 0,1,..,p, use the Newton-Raphson Newton-Raphson algorithm requires thesecond-derivatives or Hessian matrix: 2L( ) T= N i=1xixTip(xi; )(1 p(xi; )).Jia Li jialiLogistic RegressionIThe element on thejth row andnth column is (counting from0): L( ) 1j 1n= N i=1(1 +e Txi)e Txixijxin (e Txi)2xijxin(1 +e Txi)2= N i=1xijxinp(xi; ) xijxinp(xi; )2= N i=1xijxinp(xi; )(1 p(xi; )).

6 Jia Li jialiLogistic RegressionIStarting with old, a single Newton-Raphson update is new= old ( 2L( ) T) 1 L( ) ,where the derivatives are evaluated at Li jialiLogistic RegressionIThe iteration can be expressed compactly in matrix the column vector theN (p+ 1) input the N-vector of fitted probabilities withith elementp(xi; old).ILetWbe anN Ndiagonal matrix of weights withithelementp(xi; old)(1 p(xi; old)).IThen L( ) =XT(y p) 2L( ) T= Li jialiLogistic RegressionIThe Newton-Raphson step is new= old+ (XTWX) 1XT(y p)= (XTWX) 1 XTW(X old+W 1(y p))= (XTWX) 1 XTWz,wherez,X old+W 1(y p).IIfzis viewed as a response andXis the input matrix, newisthe solution to a weighted least square problem: new arg min (z X )TW(z X ).

7 IRecall that linear Regression by least square is to solvearg min (z X )T(z X ).Izis referred to as theadjusted algorithm is referred to asiteratively reweighted Li jialiLogistic RegressionPseudo setting its elements toyi={1 ifgi= 10 ifgi= 2,i= 1,2,.., setting its elements top(xi; ) =e Txi1 +e Txii= 1,2,.., the diagonal matrixW. Theith diagonal element isp(xi; )(1 p(xi; )),i= 1,2,.., X +W 1(y p).6. (XTWX) the stopping criteria is met, stop; otherwise go back to Li jialiLogistic RegressionComputational EfficiencyISinceWis anN Ndiagonal matrix, direct matrixoperations with it may be very modified pseudo code is provided Li jialiLogistic setting its elements toyi={1 ifgi= 10 ifgi= 2,i= 1,2.}}

8 , setting its elements top(xi; ) =e Txi1 +e Txii= 1,2,.., theN (p+ 1) matrix Xby multiplying theith row ofmatrixXbyp(xi; )(1 p(xi; )),i= 1,2,..,N:X= xT1xT2 xTN X= p(x1; )(1 p(x1; ))xT1p(x2; )(1 p(x2; ))xT2 p(xN; )(1 p(xN; ))xTN 5. + (XT X) 1XT(y p). the stopping criteria is met, stop; otherwise go back to step Li jialiLogistic RegressionExampleDiabetes data setIInputXis two the two principalcomponents of the original 8 1: without diabetes; Class 2: with Logistic Regression , we obtain = ( , , ) Li jialiLogistic RegressionIThe posterior probabilities are:Pr(G= 1|X=x) = + (G= 2|X=x) =11 + classification rule is: G(x) ={1 02 <0 Jia Li jialiLogistic RegressionSolid line: decision boundary obtained by Logistic Regression .}

9 Dashline: decision boundary obtained by trainingdata setclassification errorrate: : : Li jialiLogistic RegressionMulticlass Case (K 3)IWhenK 3, is a (K-1)(p+1)-vector: = 10 1 20 (K 1)0 K 1 = 10 1p (K 1) (K 1)p Jia Li jialiLogistic RegressionILet l=( l0 l).IThe likelihood function becomesL( ) =N i=1logpgi(xi; )=N i=1log(e Tgixi1 + K 1l=1e Tlxi)=N i=1[ Tgixi log(1 +K 1 l=1e Tlxi)]Jia Li jialiLogistic RegressionINote: the indicator functionI( ) equals 1 when the argumentis true and 0 order derivatives: L( ) kj=N i=1[I(gi=k)xij e Tkxixij1 + K 1l=1e Tlxi]=N i=1xij(I(gi=k) pk(xi; ))Jia Li jialiLogistic RegressionISecond order derivatives: 2L( ) kj mn=N i=1xij 1(1 + K 1l=1e Tlxi)2 [ e TkxiI(k=m)xin(1 +K 1 l=1e Tlxi) +e Tkxie Tmxixin]=N i=1xijxin( pk(xi; )I(k=m) +pk(xi; )pm(xi; ))= N i=1xijxinpk(xi; )[I(k=m) pm(xi; )].

10 Jia Li jialiLogistic RegressionIMatrix the concatenated indicator vector of dimensionN (K 1).y= 1 yk= I(g1=k)I(g2=k)..I(gN=k) 1 k K 1 Ipis the concatenated vector of fitted probabilities of dimensionN (K 1).p= 1 pk= pk(x1; )pk(x2; )..pk(xN; ) 1 k K 1 Jia Li jialiLogistic RegressionI Xis anN(K 1) (p+ 1)(K 1) matrix: X= X0 00X 0 00 X Jia Li jialiLogistic RegressionIMatrixWis anN(K 1) N(K 1) square matrix:W= W11W12 W1(K 1)W21W22 W2(K 1) W(K 1),1W(K 1),2 W(K 1),(K 1) IEach submatrixWkm, 1 k,m K 1, is anN Ndiagonal , theith diagonal element inWkkispk(xi; old)(1 pk(xi; old)).IWhenk6=m, theith diagonal element inWkmis pk(xi; old)pm(xi; old).


Related search queries