Example: biology

Principal Component Analysis, Second Edition

Principal ComponentAnalysis, Second JolliffeSpringerPreface to the Second EditionSince the first Edition of the book was published, a great deal of new ma-terial on Principal Component analysis (PCA) and related topics has beenpublished, and the time is now ripe for a new Edition . Although the size ofthe book has nearly doubled, there are only two additional chapters. Allthe chapters in the first Edition have been preserved, although two havebeen renumbered. All have been updated, some extensively. In this updat-ing process I have endeavoured to be as comprehensive as possible. Thisis reflected in the number of new references, which substantially exceedsthose in the first Edition . Given the range of areas in which PCA is used,it is certain that I have missed some topics, and my coverage of others willbe too brief for the taste of some readers.

the techniques of multivariate analysis. It was first introduced by Pear-son (1901), and developed independently by Hotelling (1933). Like many multivariate methods, it was not widely used until the advent of elec-tronic computers, but it is now well entrenched in virtually every statistical computer package.

Tags:

  Analysis, Statistical, Multivariate, Multivariate analysis

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Advertisement

Transcription of Principal Component Analysis, Second Edition

1 Principal ComponentAnalysis, Second JolliffeSpringerPreface to the Second EditionSince the first Edition of the book was published, a great deal of new ma-terial on Principal Component analysis (PCA) and related topics has beenpublished, and the time is now ripe for a new Edition . Although the size ofthe book has nearly doubled, there are only two additional chapters. Allthe chapters in the first Edition have been preserved, although two havebeen renumbered. All have been updated, some extensively. In this updat-ing process I have endeavoured to be as comprehensive as possible. Thisis reflected in the number of new references, which substantially exceedsthose in the first Edition . Given the range of areas in which PCA is used,it is certain that I have missed some topics, and my coverage of others willbe too brief for the taste of some readers.

2 The choice of which new topicsto emphasize is inevitably a personal one, reflecting my own interests andbiases. In particular, atmospheric science is a rich source of both applica-tions and methodological developments, but its large contribution to thenew material is partly due to my long-standing links with the area, and notbecause of a lack of interesting developments and examples in other example, there are large literatures in psychometrics, chemometricsand computer science that are only partially represented. Due to consid-erations of space, not everything could be included. The main changes arenow 1 to 4 describing the basic theory and providing a set of exam-ples are the least changed. It would have been possible to substitute morerecent examples for those of Chapter 4, but as the present ones give niceillustrations of the various aspects of PCA, there was no good reason to doso.

3 One of these examples has been moved to Chapter 1. One extra prop-viPreface to the Second Editionerty (A6) has been added to Chapter 2, with Property A6 in Chapter 3becoming 5 has been extended by further discussion of a number of ordina-tion and scaling methods linked to PCA, in particular varieties of the 6 has seen a major expansion. There are two parts of Chapter 6concerned with deciding how many Principal components (PCs) to retainand with using PCA to choose a subset of variables. Both of these topicshave been the subject of considerable research in recent years, although aregrettably high proportion of this research confuses PCA with factor anal-ysis, the subject of Chapter 7. Neither Chapter 7 nor 8 have been expandedas much as Chapter 6 or Chapters 9 and 9 in the first Edition contained three sections describing theuse of PCA in conjunction with discriminant analysis , cluster analysis andcanonical correlation analysis (CCA).

4 All three sections have been updated,but the greatest expansion is in the third section, where a number of othertechniques have been included, which, like CCA, deal with relationships be-tween two groups of variables. As elsewhere in the book, Chapter 9 includesyet other interesting related methods not discussed in detail. In general,the line is drawn between inclusion and exclusion once the link with PCAbecomes too 10 also included three sections in first Edition on outlier de-tection, influence and robustness. All have been the subject of substantialresearch interest since the first Edition ; this is reflected in expanded cover-age. A fourth section, on other types of stability and sensitivity, has beenadded. Some of this material has been moved from Section of the firstedition; other material is next two chapters are also new and reflect my own research interestsmore closely than other parts of the book.

5 An important aspect of PCA isinterpretation of the components once they have been obtained. This maynot be easy, and a number of approaches have been suggested for simplifyingPCs to aid interpretation. Chapter 11 discusses these, covering the well-established idea of rotation as well recently developed techniques. Thesetechniques either replace PCA by alternative procedures that give simplerresults, or approximate the PCs once they have been obtained. A smallamount of this material comes from Section of the first Edition , butthe great majority is new. The chapter also includes a section on physicalinterpretation of involvement in the developments described in Chapter 12 is less directthan in Chapter 11, but a substantial part of the chapter describes method-ology and applications in atmospheric science and reflects my long-standinginterest in that field.

6 In the first Edition , Section was concerned with non-independent and time series data. This section has been expandedto a full chapter (Chapter 12). There have been major developments inthis area, including functional PCA for time series, and various techniquesappropriate for data involving spatial and temporal variation, such as (mul-Preface to the Second Editionviitichannel) singular spectrum analysis , complex PCA, Principal oscillationpattern analysis , and extended empirical orthogonal functions (EOFs).Many of these techniques were developed by atmospheric scientists andare little known in many other last two chapters of the first Edition are greatly expanded and be-come Chapters 13 and 14 in the new Edition . There is some transfer ofmaterial elsewhere, but also new sections. In Chapter 13 there are threenew sections, on size/shape data, on quality control and a final odds-and-ends section, which includes vector, directional and complex data, intervaldata, species abundance data and large data sets.

7 All other sections havebeen expanded, that on common Principal Component analysis and relatedtopics especially first section of Chapter 14 deals with varieties of non-linear section has grown substantially compared to its counterpart (Sec-tion ) in the first Edition . It includes material on the Gifi system ofmultivariate analysis , Principal curves, and neural networks. Section weights, metrics and centerings combines, and considerably expands,the material of the first and third sections of the old Chapter 12. Thecontent of the old Section has been transferred to an earlier part inthe book (Chapter 10), but the remaining old sections survive and areupdated. The section on non-normal data includes independent compo-nent analysis (ICA), and the section on three-mode analysis also discussestechniques for three or more groups of variables.

8 The penultimate sectionis new and contains material on sweep-out components, extended com-ponents, subjective components, goodness-of-fit, and further discussion ofneural appendix on numerical computation of PCs has been retainedand updated, but, the appendix on PCA in computer packages hasbeen dropped from this Edition mainly because such material becomesout-of-date very preface to the first Edition noted three general texts on multivariateanalysis. Since 1986 a number of excellent multivariate texts have appeared,including Everitt and Dunn (2001), Krzanowski (2000), Krzanowski andMarriott (1994) and Rencher (1995, 1998), to name just a few. Two largespecialist texts on Principal Component analysis have also been (1991) gives a good, comprehensive, coverage of Principal com-ponent analysis from a somewhat different perspective than the presentbook, although it, too, is aimed at a general audience of statisticians andusers of PCA.

9 The other text, by Preisendorfer and Mobley (1988), con-centrates on meteorology and oceanography. Because of this, the notationin Preisendorfer and Mobley differs considerably from that used in main-stream statistical sources. Nevertheless, as we shall see in later chapters,especially Chapter 12, atmospheric science is a field where much devel-opment of PCA and related topics has occurred, and Preisendorfer andMobley s book brings together a great deal of relevant to the Second EditionA much shorter book on PCA (Dunteman, 1989), which is targeted atsocial scientists, has also appeared since 1986. Like the slim volume byDaultrey (1976), written mainly for geographers, it contains little preface to the first Edition noted some variations in , the notation used in the literature on PCA varies quite D of Jackson (1991) provides a useful table of notation for some ofthe main quantities in PCA collected from 34 references (mainly textbookson multivariate analysis ).

10 Where possible, the current book uses notationadopted by a majority of authors where a consensus end this Preface, I include a slightly frivolous, but nevertheless in-teresting, aside on both the increasing popularity of PCA and on itsterminology. It was noted in the preface to the first Edition that bothterms Principal Component analysis and Principal components analysis are widely used. I have always preferred the singular form as it is compati-ble with factor analysis , cluster analysis , canonical correlation analysis and so on, but had no clear idea whether the singular or plural form wasmore frequently used. A search for references to the two forms in key wordsor titles of articles using theWeb of Sciencefor the six years 1995 2000, re-vealed that the number of singular to plural occurrences were, respectively,1017 to 527 in 1995 1996; 1330 to 620 in 1997 1998; and 1634 to 635 in1999 2000.


Related search queries