Transcription of Canonical Correlation a Tutorial
1 Canonical Correlationa TutorialMagnus BorgaJanuary 12, 2001 Contents1 About this tutorial12 Introduction23 Definition24 Calculating Canonical correlations35 Relating difference between CCA and ordinary Correlation analysis .. to other linear subspace Equalnoiseenergies .. between a signal and the corrupted signal ..6A Affinetransformations .. Principal component analysis .. Partial least multivariate linear regression .. 101 About this tutorialThis is a printable version of a Tutorial in HTML format. The Tutorial may bemodified at any time as will this version. The latest version of this Tutorial isavailable magnus/cca/.
2 12 IntroductionCanonical Correlation analysis (CCA) is a way of measuring the linear relationshipbetween two multidimensional variables. It finds two bases, one for each variable,that are optimal with respect to correlations and, at the same time, it finds thecorresponding correlations . In other words, it finds the two bases in which thecorrelation matrix between the variables is diagonal and the correlations on thediagonal are maximized. The dimensionality of these new bases is equal to or lessthan the smallest dimensionality of the two important property of Canonical correlations is that they are invariant withrespect to affine transformations of the variables.
3 This is the most important differ-ence between CCA and ordinary Correlation analysis which highly depend on thebasis in which the variables are was developed by H. Hotelling [10]. Although being a standard toolin statistical analysis , where Canonical Correlation has been used for example ineconomics, medical studies, meteorology and even in classification of malt whisky,it is surprisingly unknown in the fields of learning and signal processing. Someexceptionsare[2,13,5,4,14],For further details and applications in signal processing, see my PhD thesis [3]and other DefinitionCanonical Correlation analysis can be defined as the problem of finding two sets ofbasis vectors, one forxand the other fory, such that the correlations between theprojectionsof the variables onto these basis vectors are mutually us look at the case where only one pair of basis vectors are sought, namelythe ones corresponding to the largest Canonical Correlation : Consider the linearcombinationsx=xT^wxandy=yT^wyof the two variables respectively.
4 Thismeans that the function to be maximized is =E[xy]pE[x2]E[y2]=E[^wTxxyT^wy]qE[^wTxxx T^wx]E[^wTyyyT^wy]=wTxCxywyqwTxCxxwxwTyC yywy:(1)The maximum of with respect towxandwyis the maximum canonicalcorrelation. The subsequent Canonical correlations are uncorrelated for differentsolutions, > <>:E[xixj]=E[wTxixxTwxj]=wTxiCxxwxj=0E[yiy j]=E[wTyiyyTwyj]=wTyiCyywyj=0E[xiyj]=E[w TxixyTwyj]=wTxiCxywyj=0fori6=j:(2)2 The projections ontowxandwy, , are calledcanonical Calculating Canonical correlationsConsider two random variablesxandywith zero mean. The total covariancematrixC= CxxCxyCyxCyy =E" xy xy T#(3)is a block matrix whereCxxandCxxare the within-sets covariance matrices ofxandyrespectively andCxy=CTyxis the between-sets covariance Canonical correlations betweenxandycan be found by solving the eigen-value equations(C 1xxCxyC 1yyCyx^wx= 2^wxC 1yyCyxC 1xxCxy^wy= 2^wy(4)where the eigenvalues 2are the squaredcanonical correlationsand the eigen-vectors^wxand^wyare the normalized Canonical correlationbasis of non-zero solutions to these equations are limited to the smallest dimen-sionality ofxandy.)
5 If the dimensionality ofxandyis 8 and 5 respectively,the maximum number of Canonical correlations is one of the eigenvalue equations needs to be solved since the solutions arerelated by8<:Cxy^wy= xCxx^wxCyx^wx= yCyy^wy;(5)where x= 1y=s^wTyCyy^wy^wTxCxx^wx:(6)5 Relating The difference between CCA and ordinary Correlation analysisOrdinary Correlation analysis is dependent on the coordinate system in which thevariables are described. This means that even if there is a very strong linear rela-tionship between two multidimensional signals, this relationship may not be visiblein a ordinary Correlation analysis if one coordinate system is used, while in anothercoordinate system this linear relationship would give a very high finds the coordinate system that is optimal for Correlation analysis , andthe eigenvectors of equation 4 defines this coordinate :Consider two normally distributed two-dimensional variablesxandywith unit variance.
6 Lety1+y2=x1+x2. It is easy to confirm that the correlationmatrix betweenxandyisRxy= 0:50:50:50:5 :(7)This indicates a relatively weak Correlation of despite the fact that there is aperfect linear relationship (in one dimension) CCA on this data shows that the largest (and only) Canonical Correlation isone and it also gives the direction[11]Tin which this perfect linear relationshiplies. If the variables are described in the bases given by the Canonical correlationbasis vectors ( the eigenvectors of equation 4), the Correlation matrix betweenthe variables isRxy= 1001 :(8) Relation to mutual informationThere is a relation between Correlation and mutual information. Since informa-tion is additive for statistically independent variables and the Canonical variatesare uncorrelated, the mutual information betweenxandyis the sum of mutualinformation between the variatesxiandyiif there are no higher order statistic de-pendencies than Correlation (second-order statistics).
7 For Gaussian variables thismeansI(x;y)=12log 1Qi(1 2i) =12 Xilog 1(1 2i) :(9)Kay [13] has shown that this relation plus a constant holds for all elliptically sym-metrical distributions of the formcf((z z)TC 1(z z)):(10) Relation to other linear subspace methodsInstead of the two eigenvalue equations in 4 we can formulate the problem in onesingle eigenvalue equation:B 1A^w= ^w(11)whereA= 0 CxyCyx0 ;B= Cxx00 Cyy and^w= x^wx y^wy :(12)Solving the eigenproblem in equation 11 with slightly different matrices willgive solutions toprincipal component analysis (PCA),partial least squares (PLS)andmultivariate linear regression (MLR). The matrices are listed in table 0 CxyCyx0 I00I CCA 0 CxyCyx0 Cxx00 Cyy MLR 0 CxyCyx0 Cxx00I Table 1: The matricesAandBfor PCA, PLS, CCA and Relation to SNRC orrelation is strongly related to signal to noise ratio (SNR), which is a more com-monly used measure in signal processing.
8 Consider a signalxand two noise signals 1and 2all having zero mean1and all being uncorrelated with each other. LetS=E[x2]andNi=E[ 2i]be the energy of the signal and the noise signalsrespectively. Then the Correlation betweena(x+ 1)andb(x+ 2)is =E[a(x+ 1)b(x+ 2)]pE[a2(x+ 1)2]E[b2(x+ 2)2]=E x2 q E[x2]+E 21 E[x2]+E 22 =Sp(S+N1)(S+N2):(13)Note that the amplification factorsaandbdo not affect the Correlation or the noise energiesIn the special case where the noise energies are equal, , equation 13 can be written as =SS+N:(14)This means that the SNR can be written asSN= 1 :(15)1 The assumption of zero mean is for convenience. A non-zero mean does not affect the SNR orthe , it should be noted that the noise affects the signaltwice, so this relationbetween SNR and Correlation is perhaps not so intuitive.
9 This relation is illustratedin figure 1 (top). between a signal and the corrupted signalAnother special case is whenN1=0andN2=N. Then, the Correlation betweena signal and a noise-corrupted version of that signal is =SpS(S+N):(16)In this case, the relation between SNR and Correlation isSN= 21 2:(17)This relation between Correlation and SNR is illustrated in figure 1 (bottom).A A note on Correlation and covariance matricesIn neural network literature, the matrixCxxin equation 3 is often called a corre-lation matrix. This can be a bit confusing, sinceCxxdoes not contain the correla-tions between the variables in a statistical sense, but rather the expected values ofthe products between them.
10 The Correlation betweenxiandxjis defined as ij=E[(xi xi)(xj xj)]pE[(xi xi)2]E[(xj xj)2];(18)see for example[1], the covariance betweenxiandxjnormalized by the geo-metric mean of the variances ofxiandxj( x=E[x]). Hence, the Correlation isbounded, 1 ij 1. In this Tutorial , Correlation matrices are diagonal terms ofCxxare the second orderoriginmoments,E[x2i], diagonal terms in acovariance matrixare the variances or the second ordercentralmoments,E[(xi xi)2], maximum likelihood estimator of is obtained by replacing the expecta-tion operator in equation 18 by a sum over the samples. This estimator is sometimescalled thePearson Correlation coefficientafter K. Pearson[16]. Affine transformationsAn affine transformation is simply a translation of the origin followed by a lineartransformation.