Example: dental hygienist

Principal Component Analysis - Columbia University

Principal Component AnalysisFrank WoodDecember 8, 2009 This lecture borrows andquotesfrom Joliffe s Principle Component Analysis book. Go buy it! Principal Component AnalysisThe central idea of Principal Component Analysis (PCA) isto reduce the dimensionality of a data set consisting of alarge number of interrelated variables, while retaining asmuch as possible of the variation present in the data is achieved by transforming to a new set of variables,the Principal components (PCs), which are uncorrelated,and which are ordered so that the firstfewretain most ofthe variation present inallof the original variables.[Jolliffe, Pricipal Component Analysis ,2ndedition]Datadistribution (inputs in regression Analysis )Figure: Gaussian PDFU ncorrelated projections of Principal variationFigure: Gaussian PDF with PC eigenvectorsPCA rotationFigure: PCA Projected Gaussian PDFPCA in a nutshellNotationIxis a vector ofprandom variablesI kis a vector ofpconstantsI kx= pj=1 kjxjProcedural descriptionIFind linear function ofx, 1xwith maximum find another linear function ofx, 2x, uncorrelated with 1xmaximum is hoped, in general, that most of the variation inxwill beaccounted for bymPC s wherem<< of PCAA ssumption and More NotationI is theknowncovariance matrix for the random variablexIFores

PCA in a nutshell Notation I x is a vector of p random variables I k is a vector of p constants I 0 k x = P p j=1 kjx j Procedural description I Find linear function of x, 0 1x with maximum variance. I Next nd another linear function of x, 0 2x, uncorrelated with 0 1x maximum variance. I Iterate. Goal It is hoped, in general, that most of the variation in x will be

Tags:

  Analysis, Principal component analysis, Principal, Component

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Principal Component Analysis - Columbia University

1 Principal Component AnalysisFrank WoodDecember 8, 2009 This lecture borrows andquotesfrom Joliffe s Principle Component Analysis book. Go buy it! Principal Component AnalysisThe central idea of Principal Component Analysis (PCA) isto reduce the dimensionality of a data set consisting of alarge number of interrelated variables, while retaining asmuch as possible of the variation present in the data is achieved by transforming to a new set of variables,the Principal components (PCs), which are uncorrelated,and which are ordered so that the firstfewretain most ofthe variation present inallof the original variables.[Jolliffe, Pricipal Component Analysis ,2ndedition]Datadistribution (inputs in regression Analysis )Figure: Gaussian PDFU ncorrelated projections of Principal variationFigure: Gaussian PDF with PC eigenvectorsPCA rotationFigure: PCA Projected Gaussian PDFPCA in a nutshellNotationIxis a vector ofprandom variablesI kis a vector ofpconstantsI kx= pj=1 kjxjProcedural descriptionIFind linear function ofx, 1xwith maximum find another linear function ofx, 2x, uncorrelated with 1xmaximum is hoped, in general, that most of the variation inxwill beaccounted for bymPC s wherem<< of PCAA ssumption and More NotationI is theknowncovariance matrix for the random variablexIForeshadowing : will be replaced withS, the samplecovariance matrix, when is to solutionIFork= 1,2.

2 ,pthekthPC is given byzk= kxwhere kis an eigenvector of corresponding to itskthlargesteigenvalue kis chosen to have unit length ( k k= 1) thenVar(zk) = kDerivation of PCAF irst StepIFind kxthat maximizes Var( kx) = k kIWithout constraint we could pick a very big normalization constraint, namely k k= 1 (unitlength vector).Constrained maximization - method of Lagrange multipliersITo maximize k ksubject to k k= 1 we use thetechnique of Lagrange multipliers. We maximize the function k k ( k k 1) to kby differentiating to of PCAC onstrained maximization - method of Lagrange multipliersIThis results indd k( k k k( k k 1))= 0 k k k= 0 k= k kIThis should be recognizable as an eigenvector equation where kis an eigenvector of bfand kis the eigenvector should we choose?Derivation of PCAC onstrained maximization - method of Lagrange multipliersIIf we recognize that the quantity to be maximized k k= k k k= k k k= kthen we should choose kto be as big as possible.

3 So, calling 1the largest eigenvector of and 1the correspondingeigenvector then the solution to 1= 1 1is the 1stprincipal Component general kwill be thekthPC ofxand Var( x) = kIWe will demonstrate this fork= 2,k>2 is more involvedbut of PCAC onstrained maximization - more constraintsIThe second PC, 2xmaximizes 2 2subject to beinguncorrelated with uncorrelation constraint can be expressed using any ofthese equationscov( 1x, 2x) = 1 2= 2 1= 2 1 1= 1 2 = 1 1 2= 0 IOf these, if we choose the last we can write an Langrangianto maximize 2 2 2 2( 2 2 1) 2 1 Derivation of PCAC onstrained maximization - more constraintsIDifferentiation of this quantity 2(and setting theresult equal to zero) yieldsdd 2( 2 2 2( 2 2 1) 2 1)= 0 2 2 2 1= 0 IIf we left multiply 1into this expression 1 2 2 1 2 1 1= 00 0 1 = 0then we can see that must be zero and that when this istrue that we are left with 2 2 2= 0 Derivation of PCAC learly 2 2 2= 0is another eigenvalue equation and the same strategy of choosing 2to be the eigenvector associated with the second largesteigenvalue yields the second PC ofx, namely process can be repeated fork= up topdifferent eigenvectors of along with the correspondingeigenvalues 1.

4 , the variance of each of the PC s are given byVar[ kx] = k,k= 1,2,..,pProperties of PCAFor any integerq,1 q p,consider the orthonormal lineartransformationy=B xwhereyis aq-element vector andB is aq pmatrix, and let y=B Bbe the variance-covariance matrix fory. Then thetrace of y, denoted tr( y), is maximized by takingB=Aq,whereAqconsists of the firstqcolumns this means is that if you want to choose a lower dimensionalprojection ofx, the choice ofBdescribed here is probably a goodone. It maximizes the (retained) variance of the resulting fact, since the projections are uncorrelated, the percentage ofvariance accounted for by retaining the firstqPC s is given by qk=1 k pk=1 k 100 PCA using the sample covariance matrixIf we recall that the sample covariance matrix (an unbiasedestimator for the covariance matrix ofx) is given byS=1n 1X XwhereXis a (n p) matrix with (i,j)thelement (xij xj) (inother words,Xis a zero mean design matrix).

5 We construct the matrixAby combining thepeigenvectors ofS(or eigenvectors ofX X they re the same) then we can define amatrix of PC scoresZ=XAOf course, if we instead formZby selecting theqeigenvectorscorresponding to theqlargest eigenvalues ofSwhen formingAthen we can achieve an optimal (in some senses)q-dimensionalprojection the PCA loading matrixGiven the sample covariance matrixS=1n 1X Xthe most straightforward way of computing the PCA loadingmatrix is to utilize the singular value decomposition ofS=A AwhereAis a matrix consisting of the eigenvectors ofSand is adiagonal matrix whose diagonal elements are the eigenvaluescorresponding to each a reduced dimensionality projection ofXis accomplishedby selecting theqlargest eigenvalues in and retaining theqcorresponding eigenvectors fromASample Covariance Matrix PCAF igure: Gaussian SamplesSample Covariance Matrix PCAF igure: Gaussian Samples with eigenvectors of sample covariance matrixSample Covariance Matrix PCAF igure: PC projected samplesSample Covariance Matrix PCAF igure: PC dimensionality reduction stepSample Covariance Matrix PCAF igure: PC dimensionality reduction stepPCA in linear regressionPCA is useful in linear regression in several waysIIdentification and elimination of multicolinearities in the in the dimension of the input space leading tofewer parameters and easier to the last point, the variance of the regressioncoefficient estimator is minimized by the PCA choice of will consider the following N([2 5],[ ])Iy=X[ 1 2] when no colinearities are present (no noise)Ixi3=.

6 8xi1+.5xi2imposedcolinearityNoiseless Linear Relationship with No ColinearityFigure:y=x[ 1 2] + 5,x N([2 5],[ ])Noiseless Planar RelationshipFigure:y=x[ 1 2] + 5,x N([2 5],[ ])Projection of colinear dataThe figures before showed the data without the third colineardesign matrix column. Plotting such data is not possible, but it scolinearity is obvious by PCA is applied to the design matrix of rankqless thanpthe number of positive eigenvalues discovered is equal toqthetrue rank of the design the number of PC s retained is larger thanq(and the data isperfectly colinear, etc.)allof the variance of the data is retained inthe low dimensional this example, when PCA is run on the design matrix of rank 2,the resulting projection back into two dimensions has exactly thesame distribution as of colinear dataFigure: Projection of multi-colinear data onto first two PC sReduction in regression coefficient estimator varianceIf we take the standard regression modely=X + And consider instead the PCA rotation ofXgiven byZ=ZAthen we can rewrite the regression model in terms of the PC sy=Z +.

7 We can also consider the reduced modely=Zq q+ qwhere only the firstqPC s are in regression coefficient estimator varianceIf we rewrite the regression relation asy=Z + .Then we can, becauseAis orthogonal, rewriteX =XAA =Z where =A .Clearly using least squares (or ML) to learn =A is equivalentto learning , like usual, = (Z Z) 1Z yso =A(Z Z) 1Z yReduction in regression coefficient estimator varianceWithout derivation we note that the variance-covariance matrix of is given byVar( ) = 2p k=1l 1kaka kwherelkis thekthlargest eigenvalue ofX X,akis thekthcolumnofA, and 2is the observation noise variance, N(0, 2I)This sheds light on how multicolinearities produce large variancesfor the elements of . If an eigenvectorlkis small then theresulting variance of the estimator will be in regression coefficient estimator varianceOne way to avoid this is to ignore those PC s that are associatedwith small eigenvalues, namely, use biased estimator =m k=1l 1kaka kX ywherel1:mare the large eigenvalues ofX Xandlm+1:pare ( ) = 2m k=1l 1kaka kThis is a biased estimator, but, since the variance of this estimatoris smaller it is possible that this could be an : find the bias of this estimator.

8 Hint: use the spectraldecomposition ofX with PCAPCA is not without its problems and limitationsIPCA assumes approximate normality of the input spacedistributionIPCA may still be able to produce a good low dimensionalprojection of the data even if the data isn t normally distributedIPCA may fail if the data lies on a complicated manifoldIPCA assumes that the input data is real and to considerICollins et al, A generalization of Principal components analysisto the exponential arinen, A. and Oja, E., Independent Component Analysis :algorithms and applicationsIISOMAP, LLE, Maximum variance unfolding, dataFigure: 2d Beta(.1,.1) Samples with PC sNon-normal dataFigure: PCA Project


Related search queries