Example: barber

Principal Component Analysis, Second Edition

Principal ComponentAnalysis, Second JolliffeSpringerPreface to the Second EditionSince the first Edition of the book was published, a great deal of new ma-terial on Principal Component analysis (PCA) and related topics has beenpublished, and the time is now ripe for a new Edition . Although the size ofthe book has nearly doubled, there are only two additional chapters. Allthe chapters in the first Edition have been preserved, although two havebeen renumbered. All have been updated, some extensively. In this updat-ing process I have endeavoured to be as comprehensive as possible. Thisis reflected in the number of new references, which substantially exceedsthose in the first Edition . Given the range of areas in which PCA is used,it is certain that I have missed some topics, and my coverage of others willbe too brief for the taste of some readers. The choice of which new topicsto emphasize is inevitably a personal one, reflecting my own interests andbiases. In particular, atmospheric science is a rich source of both applica-tions and methodological developments, but its large contribution to thenew material is partly due to my long-standing links with the area, and notbecause of a lack of interesting developments and examples in other example, there are large literatures in psychometrics, chemometricsand computer science that are only partially represented.

Preface to the Second Edition Since the first edition of the book was published, a great deal of new ma-terial on principal component analysis (PCA) and related topics has been published, and the time is now ripe for a new edition. Although the size of the book has nearly doubled, there are only two additional chapters. All

Tags:

  Edition, Second, Second edition

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Advertisement

Transcription of Principal Component Analysis, Second Edition

1 Principal ComponentAnalysis, Second JolliffeSpringerPreface to the Second EditionSince the first Edition of the book was published, a great deal of new ma-terial on Principal Component analysis (PCA) and related topics has beenpublished, and the time is now ripe for a new Edition . Although the size ofthe book has nearly doubled, there are only two additional chapters. Allthe chapters in the first Edition have been preserved, although two havebeen renumbered. All have been updated, some extensively. In this updat-ing process I have endeavoured to be as comprehensive as possible. Thisis reflected in the number of new references, which substantially exceedsthose in the first Edition . Given the range of areas in which PCA is used,it is certain that I have missed some topics, and my coverage of others willbe too brief for the taste of some readers. The choice of which new topicsto emphasize is inevitably a personal one, reflecting my own interests andbiases. In particular, atmospheric science is a rich source of both applica-tions and methodological developments, but its large contribution to thenew material is partly due to my long-standing links with the area, and notbecause of a lack of interesting developments and examples in other example, there are large literatures in psychometrics, chemometricsand computer science that are only partially represented.

2 Due to consid-erations of space, not everything could be included. The main changes arenow 1 to 4 describing the basic theory and providing a set of exam-ples are the least changed. It would have been possible to substitute morerecent examples for those of Chapter 4, but as the present ones give niceillustrations of the various aspects of PCA, there was no good reason to doso. One of these examples has been moved to Chapter 1. One extra prop-viPreface to the Second Editionerty (A6) has been added to Chapter 2, with Property A6 in Chapter 3becoming 5 has been extended by further discussion of a number of ordina-tion and scaling methods linked to PCA, in particular varieties of the 6 has seen a major expansion. There are two parts of Chapter 6concerned with deciding how many Principal components (PCs) to retainand with using PCA to choose a subset of variables. Both of these topicshave been the subject of considerable research in recent years, although aregrettably high proportion of this research confuses PCA with factor anal-ysis, the subject of Chapter 7.

3 Neither Chapter 7 nor 8 have been expandedas much as Chapter 6 or Chapters 9 and 9 in the first Edition contained three sections describing theuse of PCA in conjunction with discriminant analysis, cluster analysis andcanonical correlation analysis (CCA). All three sections have been updated,but the greatest expansion is in the third section, where a number of othertechniques have been included, which, like CCA, deal with relationships be-tween two groups of variables. As elsewhere in the book, Chapter 9 includesyet other interesting related methods not discussed in detail. In general,the line is drawn between inclusion and exclusion once the link with PCAbecomes too 10 also included three sections in first Edition on outlier de-tection, influence and robustness. All have been the subject of substantialresearch interest since the first Edition ; this is reflected in expanded cover-age. A fourth section, on other types of stability and sensitivity, has beenadded. Some of this material has been moved from Section of the firstedition; other material is next two chapters are also new and reflect my own research interestsmore closely than other parts of the book.

4 An important aspect of PCA isinterpretation of the components once they have been obtained. This maynot be easy, and a number of approaches have been suggested for simplifyingPCs to aid interpretation. Chapter 11 discusses these, covering the well-established idea of rotation as well recently developed techniques. Thesetechniques either replace PCA by alternative procedures that give simplerresults, or approximate the PCs once they have been obtained. A smallamount of this material comes from Section of the first Edition , butthe great majority is new. The chapter also includes a section on physicalinterpretation of involvement in the developments described in Chapter 12 is less directthan in Chapter 11, but a substantial part of the chapter describes method-ology and applications in atmospheric science and reflects my long-standinginterest in that field. In the first Edition , Section was concerned with non-independent and time series data. This section has been expandedto a full chapter (Chapter 12).

5 There have been major developments inthis area, including functional PCA for time series, and various techniquesappropriate for data involving spatial and temporal variation, such as (mul-Preface to the Second Editionviitichannel) singular spectrum analysis, complex PCA, Principal oscillationpattern analysis, and extended empirical orthogonal functions (EOFs).Many of these techniques were developed by atmospheric scientists andare little known in many other last two chapters of the first Edition are greatly expanded and be-come Chapters 13 and 14 in the new Edition . There is some transfer ofmaterial elsewhere, but also new sections. In Chapter 13 there are threenew sections, on size/shape data, on quality control and a final odds-and-ends section, which includes vector, directional and complex data, intervaldata, species abundance data and large data sets. All other sections havebeen expanded, that on common Principal Component analysis and relatedtopics especially first section of Chapter 14 deals with varieties of non-linear section has grown substantially compared to its counterpart (Sec-tion ) in the first Edition .

6 It includes material on the Gifi system ofmultivariate analysis, Principal curves, and neural networks. Section weights, metrics and centerings combines, and considerably expands,the material of the first and third sections of the old Chapter 12. Thecontent of the old Section has been transferred to an earlier part inthe book (Chapter 10), but the remaining old sections survive and areupdated. The section on non-normal data includes independent compo-nent analysis (ICA), and the section on three-mode analysis also discussestechniques for three or more groups of variables. The penultimate sectionis new and contains material on sweep-out components, extended com-ponents, subjective components, goodness-of-fit, and further discussion ofneural appendix on numerical computation of PCs has been retainedand updated, but, the appendix on PCA in computer packages hasbeen dropped from this Edition mainly because such material becomesout-of-date very preface to the first Edition noted three general texts on multivariateanalysis.

7 Since 1986 a number of excellent multivariate texts have appeared,including Everitt and Dunn (2001), Krzanowski (2000), Krzanowski andMarriott (1994) and Rencher (1995, 1998), to name just a few. Two largespecialist texts on Principal Component analysis have also been (1991) gives a good, comprehensive, coverage of Principal com-ponent analysis from a somewhat different perspective than the presentbook, although it, too, is aimed at a general audience of statisticians andusers of PCA. The other text, by Preisendorfer and Mobley (1988), con-centrates on meteorology and oceanography. Because of this, the notationin Preisendorfer and Mobley differs considerably from that used in main-stream statistical sources. Nevertheless, as we shall see in later chapters,especially Chapter 12, atmospheric science is a field where much devel-opment of PCA and related topics has occurred, and Preisendorfer andMobley s book brings together a great deal of relevant to the Second EditionA much shorter book on PCA (Dunteman, 1989), which is targeted atsocial scientists, has also appeared since 1986.

8 Like the slim volume byDaultrey (1976), written mainly for geographers, it contains little preface to the first Edition noted some variations in , the notation used in the literature on PCA varies quite D of Jackson (1991) provides a useful table of notation for some ofthe main quantities in PCA collected from 34 references (mainly textbookson multivariate analysis). Where possible, the current book uses notationadopted by a majority of authors where a consensus end this Preface, I include a slightly frivolous, but nevertheless in-teresting, aside on both the increasing popularity of PCA and on itsterminology. It was noted in the preface to the first Edition that bothterms Principal Component analysis and Principal components analysis are widely used. I have always preferred the singular form as it is compati-ble with factor analysis, cluster analysis, canonical correlation analysis and so on, but had no clear idea whether the singular or plural form wasmore frequently used.

9 A search for references to the two forms in key wordsor titles of articles using theWeb of Sciencefor the six years 1995 2000, re-vealed that the number of singular to plural occurrences were, respectively,1017 to 527 in 1995 1996; 1330 to 620 in 1997 1998; and 1634 to 635 in1999 2000. Thus, there has been nearly a 50 percent increase in citationsof PCA in one form or another in that period, but most of that increasehas been in the singular form, which now accounts for 72% of , it is not necessary to change the title of this T. JolliffeApril, 2002 Aberdeen, U. to the First EditionPrincipal Component analysis is probably the oldest and best known ofthe techniques of multivariate analysis. It was first introduced by Pear-son (1901), and developed independently by Hotelling (1933). Like manymultivariate methods, it was not widely used until the advent of elec-tronic computers, but it is now well entrenched in virtually every statisticalcomputer central idea of Principal Component analysis is to reduce the dimen-sionality of a data set in which there are a large number of interrelatedvariables, while retaining as much as possible of the variation present inthe data set.

10 This reduction is achieved by transforming to a new set ofvariables, the Principal components, which are uncorrelated, and which areordered so that the firstfewretain most of the variation present inallofthe original variables. Computation of the Principal components reduces tothe solution of an eigenvalue-eigenvector problem for a positive-semidefinitesymmetric matrix. Thus, the definition and computation of Principal com-ponents are straightforward but, as will be seen, this apparently simpletechnique has a wide variety of different applications, as well as a num-ber of different derivations. Any feelings that Principal Component analysisis a narrow subject should soon be dispelled by the present book; indeedsome quite broad topics which are related to Principal Component analysisreceive no more than a brief mention in the final two the term Principal Component analysis is in common usage,and is adopted in this book, other terminology may be encountered for thesame technique, particularly outside of the statistical literature.


Related search queries