Transcription of Independent Component Analysis
1 IndependentComponentAnalysisIndependentC omponentAnalysisFinal version of 7 March 2001 Aapo Hyv arinen, Juha Karhunen, and Erkki OjaA Wiley-Interscience PublicationJOHN WILEY & SONS, York / Chichester / Weinheim / Brisbane / Singapore / TorontoContentsPrefacexvii1 representation of multivariate general statistical reduction as a guiding source mixtures of unknown separation based on Component to find the Independent of ICA11vviCONTENTSPart I MATHEMATICAL PRELIMINARIES2 Random Vectors and distributions and of a random of a random and marginal and and general vector and correlation and joint of and and densities and Bayes multivariate gaussian of the gaussian limit of a and classification of , moments, and their processes * and , mean.
2 And stationary averages and signal remarks and references51 Problems523 Gradients and Optimization and matrix of series rules for unconstrained natural gradient and relative gradient of stochastic on-line algorithms * rules for constrained Lagrange remarks and references75 Problems754 Estimation of of least-squares and generalized least squares * likelihood estimation * mean-square error a posteriori (MAP) remarks and references99 Problems1015 Information of and coding of a using using Kullback-Leibler entropy property of gaussian of entropy by density expansions for entropy of entropy by nonpolynomial functions the maximum the nonpolynomial special remarks and references120 Problems121 Appendix proofs1226 Principal Component Analysis and by variance by minimum MSE the number of principal components computation of by on-line stochastic gradient ascent subspace learning PAST algorithm * and back-propagation learning * of PCA to nonquadratic criteria * remarks and references143 Problems144
3 CONTENTSixPart II BASIC Independent Component ANALYSIS7 What is Independent Component Analysis ? of Independent Component as estimation of a generative in of the of is stronger that and is only half gaussian variables are remarks and references163 Problems1648 ICA by Maximization of Nongaussian is Independent nongaussianity by give Independent algorithm using fast fixed-point algorithm using nongaussianity by of as nongaussianity algorithm using fast fixed-point algorithm using several Independent of and projection for interesting is remarks and references198xCONTENTSP roblems199 Appendix proofs2019 ICA by Maximum Likelihood likelihood of the ICA the of the for maximum likelihood fast
4 Fixed-point infomax remarks and references214 Problems218 Appendix proofs21910 ICA by Minimization of Mutual Defining ICA by mutual Information-theoretic Mutual information as measure of dependence Mutual information and Mutual information and Algorithms for minimization of mutual Concluding remarks and references225 Problems22711 ICA by Tensorial Definition of cumulant Tensor eigenvalues give Independent Tensor decomposition by a power Joint approximate diagonalization of Weighted correlation matrix The FOBI From FOBI to Concluding remarks and references236 Problems237 CONTENTSxi12 ICA by Nonlinear Decorrelation and Nonlinear Nonlinear correlations and The H erault-Jutten The Cichocki-Unbehauen The estimating functions approach * Equivariant adaptive separation via Nonlinear principal The nonlinear PCA criterion and Learning rules for the nonlinear PCA The nonlinear subspace Convergence of the nonlinear subspace rule * Nonlinear recursive least-squares Concluding remarks and references261 Problems26213 Practical Preprocessing by time Why time filtering is Low-pass High-pass filtering and Optimal Preprocessing by Making the mixing matrix Reducing noise and preventing How
5 Many components should be estimated? Choice of Concluding remarks and references272 Problems27214 Overview and Comparison of Basic ICA Objective functions vs. Connections between ICA estimation Similarities between estimation Differences between estimation Statistically optimal Comparison of asymptotic variance * Comparison of robustness * Practical choice of Experimental comparison of ICA Experimental set-up and Results for simulated Comparisons with real-world Summary of basic ICA287 Appendix Proofs289 Part III EXTENSIONS AND RELATED METHODS15 Noisy Sensor noise vs. source Few noise Estimation of the mixing Bias removal Higher-order cumulant Maximum likelihood Estimation of the noise-free Independent Maximum a posteriori Special case of shrinkage Denoising by sparse code Concluding remarks30416 ICA with Overcomplete Estimation of the Independent Maximum likelihood The case of supergaussian Estimation of the mixing Maximizing joint Maximizing likelihood Approximate estimation by quasiorthogonality Other Concluding remarks313 CONTENTS xiii17 Nonlinear Nonlinear ICA and The nonlinear ICA and BSS Existence and uniqueness of nonlinear
6 Separation of post-nonlinear Nonlinear BSS using self-organizing A generative topographic mapping approach * The modified GTM An An ensemble learning approach to nonlinear Ensemble Model Computing Kullback-Leibler cost function * Learning procedure * Experimental Other Concluding remarks33918 Methods using Time Separation by An alternative to Using one time Extension to several time Separation by nonstationarity of Using local Using Separation principles Comparison of separation Kolmogoroff complexity as unifying framework Concluding remarks354xivCONTENTS19 Convolutive Mixtures and Blind Blind Problem Bussgang Cumulant-based Blind deconvolution using linear Blind separation of convolutive The convolutive BSS Reformulation as ordinary Natural gradient Fourier transform Spatiotemporal decorrelation Other methods for convolutive Concluding remarks368 Appendix Discrete-time filters and thez-transform36920 Other Priors on the mixing Motivation for prior Classic Sparse Spatiotemporal Relaxing the independence Multidimensional Independent subspace Topographic Complex-valued Basic concepts of complex random Indeterminacy of the Independent
7 Choice of the nongaussianity Consistency of Fixed-point Relation to Independent Concluding remarks387 CONTENTSxvPart IV APPLICATIONS OF ICA21 Feature Extraction by Linear Gabor ICA and Sparse Estimating ICA bases from Image denoising by sparse code Component Remarks on Denoising Independent subspaces and topographic Neurophysiological Concluding remarks40522 Brain Imaging Electro- and Classes of brain imaging Measuring electric activity in the Validity of the basic ICA Artifact identification from EEG and Analysis of evoked magnetic ICA applied on other measurement Concluding remarks41423 Multiuser detection and CDMA CDMA signal model and Estimating fading Minimization of Channel estimation * Comparisons and Blind separation of convolved CDMA mixtures * Feedback Semiblind separation Simulations and Improving multiuser detection using complex ICA * Data ICA based Simulation Concluding remarks and references43924 Other Financial Finding hidden factors in financial Time series prediction by Audio Further applications448 References449 Index476 PrefaceIndependent Component Analysis (ICA)
8 Is a statistical and computational techniquefor revealing hidden factors that underlie sets of random variables, measurements, orsignals. ICA defines a generative model for the observed multivariate data,which istypically given as a large database of samples. In the model, the data variables areassumed to be linear or nonlinear mixtures of some unknown latent variables, andthe mixing system is also unknown. The latent variables are assumed nongaussianand mutually Independent , and they are called the Independent components of theobserved data. These Independent components, also called sources or factors, can befound by can be seen as an extension to principal Component Analysis and factoranalysis.
9 ICA is a much more powerful technique, however, capable of finding theunderlying factors or sources when these classic methods fail data analyzed by ICA could originate from many different kinds of applica-tion fields, including digital images and document databases, as well as economicindicators and psychometric measurements. In many cases, the measurements aregiven as a set of parallel signals or time series; the term blind source separation is usedto characterize this problem. Typical examples are mixtures of simultaneousspeechsignals that have been picked up by several microphones, brain waves recordedbymultiple sensors, interfering radio signals arriving at a mobile phone, or parallel timeseries obtained from some industrial technique of ICA is a relatively new invention.
10 It was for the firsttime in-troduced in early 1980s in the context of neural network modeling. In mid-1990s,some highly successful new algorithms were introduced by several researchgroups,xviixviiiPREFACE together with impressive demonstrations on problems like the cocktail-party effect,where the individual speech waveforms are found from their mixtures. ICA becameone of the exciting new topics, both in the field of neural networks, especially unsu-pervised learning, and more generally in advanced statistics and signal real-world applications of ICA on biomedical signal processing,audio sig-na