Example: dental hygienist

A TUTORIAL ON PRINCIPAL COMPONENT ANALYSIS …

A TUTORIAL ON PRINCIPAL COMPONENT ANALYSISD erivation, Discussion and Singular Value DecompositionJon March 2003|Version 1 PRINCIPAL COMPONENT ANALYSIS (PCA) is a mainstayof modern data ANALYSIS - a black box that is widelyused but poorly understood. The goal of this paper isto dispel the magic behind this black box. This tutorialfocuses on building a solid intuition for how and whyprincipal COMPONENT ANALYSIS works; furthermore, itcrystallizes this knowledge by deriving from first prin-cipals, the mathematics behind PCA . This tutorialdoes not shy away from explaining the ideas infor-mally, nor does it shy away from the hope is that by addressing both aspects, readersof all levels will be able to gain a better understand-ing of the power of PCA as well as the when, the howa

imenter records a set of data consisting of multiple measurements (e.g. voltage, position, etc.). The number of measurement types is the dimension of ... choice of a basis B is the identity matrix I. B = ... the set of potential bases, and (2) formalizing the im-plicit assumption of continuity in a data set. A subtle

Tags:

  Analysis, Multiple, Base, Principal component analysis, Principal, Component, Choice

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of A TUTORIAL ON PRINCIPAL COMPONENT ANALYSIS …

1 A TUTORIAL ON PRINCIPAL COMPONENT ANALYSISD erivation, Discussion and Singular Value DecompositionJon March 2003|Version 1 PRINCIPAL COMPONENT ANALYSIS (PCA) is a mainstayof modern data ANALYSIS - a black box that is widelyused but poorly understood. The goal of this paper isto dispel the magic behind this black box. This tutorialfocuses on building a solid intuition for how and whyprincipal COMPONENT ANALYSIS works; furthermore, itcrystallizes this knowledge by deriving from first prin-cipals, the mathematics behind PCA . This tutorialdoes not shy away from explaining the ideas infor-mally, nor does it shy away from the hope is that by addressing both aspects, readersof all levels will be able to gain a better understand-ing of the power of PCA as well as the when, the howand the why of applying this COMPONENT ANALYSIS (PCA)

2 Has been calledone of the most valuable results from applied lin-ear used abundantly in all formsof ANALYSIS - from neuroscience to computer graphics- because it is a simple, non-parametric method ofextracting relevant information from confusing datasets. With minimal additional effortPCAprovidesa roadmap for how to reduce a complex data set toa lower dimension to reveal the sometimes hidden,simplified dynamics that often underlie goal of this TUTORIAL is to provide both an intu-itive feel forPCA, and a thorough discussion of thistopic.

3 We will begin with a simple example and pro-vide an intuitive explanation of the goal ofPCA. Wewill continue by adding mathematical rigor to placeit within the framework of linear algebra and explic-itly solve this problem. We will see how and whyPCAis intimately related to the mathematical tech-nique of singular value decomposition (SVD). Thisunderstanding will lead us to a prescription for howto applyPCAin the real world. We will discuss boththe assumptions behind this technique as well as pos-sible extensions to overcome these discussion and explanations in this paper areinformal in the spirit of a TUTORIAL .

4 The goal of thispaper is toeducate. Occasionally, rigorous mathe-matical proofs are necessary although relegated tothe Appendix. Although not as vital to the TUTORIAL ,the proofs are presented for the adventurous readerwho desires a more complete understanding of themath. The only assumption is that the reader has aworking knowledge of linear algebra. Nothing feel free to contact me with any suggestions,corrections or : A Toy ExampleHere is the perspective: we are an experimenter. Weare trying to understand some phenomenon by mea-suring various quantities ( spectra, voltages, ve-locities, etc.)

5 In our system. Unfortunately, we cannot figure out what is happening because the dataappears clouded, unclear and even redundant. Thisis not a trivial problem, but rather a fundamentalobstacle to experimental science. Examples aboundfrom complex systems such as neuroscience, photo-science, meteorology and oceanography - the numberof variables to measure can be unwieldy and at timesevendeceptive, because the underlying dynamics canoften be quite for example a simple toy problem fromphysics diagrammed in Figure 1. Pretend we arestudying the motion of the physicist s ideal system consists of a ball of massmattached toamassless, frictionlessspring.

6 The ball is released asmall distance away from equilibrium ( the springis stretched). Because the spring is ideal, it oscil-lates indefinitely along thex-axis about its equilib-rium at a set is a standard problem in physics in which the1 Figure 1: A diagram of the toy along thexdirection is solved by an explicitfunction of time. In other words, the underlying dy-namics can be expressed as a function of a single , being ignorant experimenters we do notknow any of this. We do not know which, let alonehow many, axes and dimensions are important tomeasure.

7 Thus, we decide to measure the ball s posi-tion in a three-dimensional space (since we live in athree dimensional world). Specifically, we place threemovie cameras around our system of interest. At200 Hz each movie camera records an image indicat-ing a two dimensional position of the ball (a projec-tion). Unfortunately, because of our ignorance, wedo not even know what are thereal x , y and z axes, so we choose three camera axes{~a,~b,~c}at somearbitrary angles with respect to the system. The an-gles between our measurements might not even be90o!

8 Now, we record with the cameras for 2 big question remains:how do we get from thisdata set to a simple equation ofx?We know a-priori that if we were smart experi-menters, we would have just measured the positionalong thex-axis with one camera. But this is notwhat happens in the real world. We often do notknow what measurements best reflect the dynamicsof our system in question. Furthermore, we some-times record more dimensions than we actually need!Also, we have to deal with that pesky, real-worldproblem ofnoise. In the toy example this meansthat we need to deal with air, imperfect cameras oreven friction in a less-than-ideal spring.

9 Noise con-taminates our data set only serving to obfuscate thedynamics toy example is the challengeexperimenters face will refer to thisexample as we delve further into abstract , by the end of this paper we will have agood understanding of how to systematically extractxusing PRINCIPAL COMPONENT : Change of BasisThe Goal: PRINCIPAL COMPONENT ANALYSIS computesthe most meaningfulbasisto re-express a noisy, gar-bled data set. The hope is that this new basis willfilter out the noise and reveal hidden dynamics. Inthe example of the spring, the explicit goal ofPCAisto determine: the dynamics are along thex-axis.

10 In other words, the goal ofPCAis to determine that x- the unit basis vector along thex-axis - is the im-portant dimension. Determining this fact allows anexperimenter to discern which dynamics are impor-tant and which are just Naive BasisWith a more precise definition of our goal, we needa more precise definition of our data as well. Foreach time sample (or experimental trial), an exper-imenter records a set of data consisting of multiplemeasurements ( voltage, position, etc.). Thenumber of measurement types is thedimensionofthe data set.


Related search queries