Example: bankruptcy

A TUTORIAL ON PRINCIPAL COMPONENT ANALYSIS …

A TUTORIAL ON PRINCIPAL COMPONENT ANALYSISD erivation, Discussion and Singular Value DecompositionJon March 2003|Version 1 PRINCIPAL COMPONENT ANALYSIS (PCA) is a mainstayof modern data ANALYSIS - a black box that is widelyused but poorly understood. The goal of this paper isto dispel the magic behind this black box. This tutorialfocuses on building a solid intuition for how and whyprincipal COMPONENT ANALYSIS works; furthermore, itcrystallizes this knowledge by deriving from first prin-cipals, the mathematics behind PCA . This tutorialdoes not shy away from explaining the ideas infor-mally, nor does it shy away from the hope is that by addressing both aspects, readersof all levels will be able to gain a better understand-ing of the power of PCA as well as the when, the howand the why of applying this COMPONENT ANALYSIS (PCA) has been calledone of the most valuable results from applied lin-ear used abundantly in all formsof ANALYSIS - from neuroscience to computer graphics- because it is a simple, non-parametric method ofextracting relevant information from confusing datasets.

focuses on building a solid intuition for how and why principal component analysis works; furthermore, it crystallizes this knowledge by deriving from first prin-cipals, the mathematics behind PCA . This tutorial does not shy away from explaining the ideas infor-

Tags:

  Analysis, Principal component analysis, Principal, Component, Intuition

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of A TUTORIAL ON PRINCIPAL COMPONENT ANALYSIS …

1 A TUTORIAL ON PRINCIPAL COMPONENT ANALYSISD erivation, Discussion and Singular Value DecompositionJon March 2003|Version 1 PRINCIPAL COMPONENT ANALYSIS (PCA) is a mainstayof modern data ANALYSIS - a black box that is widelyused but poorly understood. The goal of this paper isto dispel the magic behind this black box. This tutorialfocuses on building a solid intuition for how and whyprincipal COMPONENT ANALYSIS works; furthermore, itcrystallizes this knowledge by deriving from first prin-cipals, the mathematics behind PCA . This tutorialdoes not shy away from explaining the ideas infor-mally, nor does it shy away from the hope is that by addressing both aspects, readersof all levels will be able to gain a better understand-ing of the power of PCA as well as the when, the howand the why of applying this COMPONENT ANALYSIS (PCA) has been calledone of the most valuable results from applied lin-ear used abundantly in all formsof ANALYSIS - from neuroscience to computer graphics- because it is a simple, non-parametric method ofextracting relevant information from confusing datasets.

2 With minimal additional effortPCAprovidesa roadmap for how to reduce a complex data set toa lower dimension to reveal the sometimes hidden,simplified dynamics that often underlie goal of this TUTORIAL is to provide both an intu-itive feel forPCA, and a thorough discussion of thistopic. We will begin with a simple example and pro-vide an intuitive explanation of the goal ofPCA. Wewill continue by adding mathematical rigor to placeit within the framework of linear algebra and explic-itly solve this problem. We will see how and whyPCAis intimately related to the mathematical tech-nique of singular value decomposition (SVD). Thisunderstanding will lead us to a prescription for howto applyPCAin the real world. We will discuss boththe assumptions behind this technique as well as pos-sible extensions to overcome these discussion and explanations in this paper areinformal in the spirit of a TUTORIAL .

3 The goal of thispaper is toeducate. Occasionally, rigorous mathe-matical proofs are necessary although relegated tothe Appendix. Although not as vital to the TUTORIAL ,the proofs are presented for the adventurous readerwho desires a more complete understanding of themath. The only assumption is that the reader has aworking knowledge of linear algebra. Nothing feel free to contact me with any suggestions,corrections or : A Toy ExampleHere is the perspective: we are an experimenter. Weare trying to understand some phenomenon by mea-suring various quantities ( spectra, voltages, ve-locities, etc.) in our system. Unfortunately, we cannot figure out what is happening because the dataappears clouded, unclear and even redundant. Thisis not a trivial problem, but rather a fundamentalobstacle to experimental science.

4 Examples aboundfrom complex systems such as neuroscience, photo-science, meteorology and oceanography - the numberof variables to measure can be unwieldy and at timesevendeceptive, because the underlying dynamics canoften be quite for example a simple toy problem fromphysics diagrammed in Figure 1. Pretend we arestudying the motion of the physicist s ideal system consists of a ball of massmattached toamassless, frictionlessspring. The ball is released asmall distance away from equilibrium ( the springis stretched). Because the spring is ideal, it oscil-lates indefinitely along thex-axis about its equilib-rium at a set is a standard problem in physics in which the1 Figure 1: A diagram of the toy along thexdirection is solved by an explicitfunction of time. In other words, the underlying dy-namics can be expressed as a function of a single , being ignorant experimenters we do notknow any of this.

5 We do not know which, let alonehow many, axes and dimensions are important tomeasure. Thus, we decide to measure the ball s posi-tion in a three-dimensional space (since we live in athree dimensional world). Specifically, we place threemovie cameras around our system of interest. At200 Hz each movie camera records an image indicat-ing a two dimensional position of the ball (a projec-tion). Unfortunately, because of our ignorance, wedo not even know what are thereal x , y and z axes, so we choose three camera axes{~a,~b,~c}at somearbitrary angles with respect to the system. The an-gles between our measurements might not even be90o! Now, we record with the cameras for 2 big question remains:how do we get from thisdata set to a simple equation ofx?We know a-priori that if we were smart experi-menters, we would have just measured the positionalong thex-axis with one camera.

6 But this is notwhat happens in the real world. We often do notknow what measurements best reflect the dynamicsof our system in question. Furthermore, we some-times record more dimensions than we actually need!Also, we have to deal with that pesky, real-worldproblem ofnoise. In the toy example this meansthat we need to deal with air, imperfect cameras oreven friction in a less-than-ideal spring. Noise con-taminates our data set only serving to obfuscate thedynamics toy example is the challengeexperimenters face will refer to thisexample as we delve further into abstract , by the end of this paper we will have agood understanding of how to systematically extractxusing PRINCIPAL COMPONENT : Change of BasisThe Goal: PRINCIPAL COMPONENT ANALYSIS computesthe most meaningfulbasisto re-express a noisy, gar-bled data set.

7 The hope is that this new basis willfilter out the noise and reveal hidden dynamics. Inthe example of the spring, the explicit goal ofPCAisto determine: the dynamics are along thex-axis. In other words, the goal ofPCAis to determine that x- the unit basis vector along thex-axis - is the im-portant dimension. Determining this fact allows anexperimenter to discern which dynamics are impor-tant and which are just Naive BasisWith a more precise definition of our goal, we needa more precise definition of our data as well. Foreach time sample (or experimental trial), an exper-imenter records a set of data consisting of multiplemeasurements ( voltage, position, etc.). Thenumber of measurement types is thedimensionofthe data set. In the case of the spring, this data sethas 12,000 6-dimensional vectors, where each cameracontributes a 2-dimensional projection of the ball general, each data sample is a vector inm-dimensional space, wheremis the number of mea-surement types.

8 Equivalently, every time sample isa vector that lies in anm-dimensionalvector spacespanned by an orthonormal basis. All measurementvectors in this space are a linear combination of thisset of unit length basis vectors. A naive and simple2choice of a basisBis the identity = 1 0 00 1 0 1 =Iwhere eachrowis a basis summarize, at one point in time, cameraArecords a corresponding position (xA(t), yA(t)). Eachtrial can be expressed as a six dimesional column vec-tor~X.~X= xAyAxByBxCyC where each camera contributes two points and theentire vector~Xis the set of coefficients in the of BasisWith this rigor we may now state more preciselywhatPCAasks:Is there another basis, which is alinear combination of the original basis, that best re-expresses our data set?A close reader might have noticed the conspicuousaddition of the wordlinear.

9 Indeed,PCAmakes onestringent but powerful assumption:linearity. Lin-earity vastly simplifies the problem by (1) restrictingthe set of potential bases, and (2) formalizing the im-plicit assumption of continuity in a data subtlepoint it is, but we have already assumed linearity byimplicitly stating that the data set even characterizesthe dynamics of the system!In other words, we arealready relying on the superposition PRINCIPAL of lin-earity to believe that the data characterizes or pro-vides an ability to interpolate between the individualdata be sure, complex systems are almost always nonlinearand often their main qualitative features are a direct result oftheir nonlinearity. However, locally linear approximations usu-ally provide a good approximation because non-linear, higherorder terms vanish at the limit of small this assumptionPCAis now limited to re-expressing the data as alinear combinationof its ba-sis vectors.

10 LetXandYbem nmatrices related bya linear the original recordeddata set andYis a re-representation of that data (1)Also let us define the following quantities2. piare therowsofP xiare thecolumnsofX yiare 1 represents achange of basisand thus canhave many a matrix that Geometrically,Pis a rotation and a stretchwhich again The rows ofP,{p1, .. ,pm}, are a set of newbasis vectors for expressing latter interpretation is not obvious but can beseen by writing out the explicit dot products [x1 xn]Y= p1 x1 p1 x1 pm xn We can note the form of each column p1 xi We recognize that each coefficient ofyiis a dot-product ofxiwith the corresponding row inP. In2In this sectionxiandyiarecolumnvectors, but be fore-warned. In all other words, thejthcoefficient ofyiis a projectionon to thejthrow ofP.


Related search queries