Data, Covariance, and Correlation Matrix

Data, covariance , and Correlation MatrixNathaniel E. HelwigAssistant Professor of Psychology and StatisticsUniversity of Minnesota (Twin Cities)Updated 16-Jan-2017 Nathaniel E. Helwig (U of Minnesota)Data, covariance , and Correlation MatrixUpdated 16-Jan-2017 : Slide 1 CopyrightCopyrightc 2017 by Nathaniel E. HelwigNathaniel E. Helwig (U of Minnesota)Data, covariance , and Correlation MatrixUpdated 16-Jan-2017 : Slide 2 Outline of Notes1) The Data MatrixDefinitionPropertiesR code2) The covariance MatrixDefinitionPropertiesR code3) The Correlation MatrixDefinitionPropertiesR code4) Miscellaneous TopicsCrossproduct calculationsVec and KroneckerVisualizing dataNathaniel E. Helwig (U of Minnesota)Data, covariance , and Correlation MatrixUpdated 16-Jan-2017 : Slide 3 The Data MatrixThe Data MatrixNathaniel E.

Helwig (U of Minnesota)Data, covariance , and Correlation MatrixUpdated 16-Jan-2017 : Slide 4 The Data MatrixDefinitionThe Organization of DataThe data Matrix refers to the array of numbersX= x11x12 x1px21x22 x2px31x32 xnp wherexijis thej-th variable collected from thei-th item ( , subject).items/subjects are rowsvariables are columnsXis a data Matrix of ordern p(# items by # variables).Nathaniel E. Helwig (U of Minnesota)Data, covariance , and Correlation MatrixUpdated 16-Jan-2017 : Slide 5 The Data MatrixDefinitionCollection of Column VectorsWe can view a data Matrix as a collection of column vectors:X= x1x2 xp wherexjis thej-th column ofXforj {1,..,p}.Then 1 vectorxjgives thej-th variable s scores for E.

Helwig (U of Minnesota)Data, covariance , and Correlation MatrixUpdated 16-Jan-2017 : Slide 6 The Data MatrixDefinitionCollection of Row VectorsWe can view a data Matrix as a collection of row vectors:X= x 1x n wherex iis thei-th row ofXfori {1,..,n}.The 1 pvectorx igives thei-th item s scores for E. Helwig (U of Minnesota)Data, covariance , and Correlation MatrixUpdated 16-Jan-2017 : Slide 7 The Data MatrixPropertiesCalculating Variable (Column) MeansThe sample mean of thej-th variable is given by xj=1nn i=1xij=n 11 nxjwhere1ndenotes ann 1 vector of onesxjdenotes thej-th column ofXNathaniel E. Helwig (U of Minnesota)Data, covariance , and Correlation MatrixUpdated 16-Jan-2017 : Slide 8 The Data MatrixPropertiesCalculating Item (Row) MeansThe sample mean of thei-th item is given by xi=1pp j=1xij=p 1x i1pwhere1pdenotes anp 1 vector of onesx idenotes thei-th row ofXNathaniel E.

Helwig (U of Minnesota)Data, covariance , and Correlation MatrixUpdated 16-Jan-2017 : Slide 9 The Data MatrixR CodeData Frame and Matrix Classes in R> data(mtcars)> class(mtcars)[1] " "> dim(mtcars)[1] 32 11> head(mtcars)mpg cyl disp hp drat wt qsec vs am gear carbMazda RX4 6 160 110 0 1 4 4 Mazda RX4 Wag 6 160 110 0 1 4 4 Datsun 710 4 108 93 1 1 4 1 Hornet 4 Drive 6 258 110 1 0 3 1 Hornet Sportabout 8 360 175 0 0 3 2 Valiant 6 225 105 1 0 3 1> X <- (mtcars)> class(X)[1] " Matrix "Nathaniel E. Helwig (U of Minnesota)Data, covariance , and Correlation MatrixUpdated 16-Jan-2017 : Slide 10 The Data MatrixR CodeRow and Column Means> # get row means (3 ways)> rowMeans(X)[1:3]Mazda RX4 Mazda RX4 Wag Datsun > c(mean(X[1,]), mean(X[2,]), mean(X[3,]))[1] > apply(X,1,mean)[1:3]Mazda RX4 Mazda RX4 Wag Datsun > # get column means (3 ways)> colMeans(X)[1:3]mpg cyl > c(mean(X[,1]), mean(X[,2]), mean(X[,3]))[1] > apply(X,2,mean)[1:3]mpg cyl E.

Helwig (U of Minnesota)Data, covariance , and Correlation MatrixUpdated 16-Jan-2017 : Slide 11 The Data MatrixR CodeOther Row and Column Functions> # get column medians> apply(X,2,median)[1:3]mpg cyl > c(median(X[,1]), median(X[,2]), median(X[,3]))[1] > # get column ranges> apply(X,2,range)[,1:3]mpg cyl disp[1,] 4 [2,] 8 > cbind(range(X[,1]), range(X[,2]), range(X[,3]))[,1] [,2] [,3][1,] 4 [2,] 8 E. Helwig (U of Minnesota)Data, covariance , and Correlation MatrixUpdated 16-Jan-2017 : Slide 12 The covariance MatrixThe covariance MatrixNathaniel E. Helwig (U of Minnesota)Data, covariance , and Correlation MatrixUpdated 16-Jan-2017 : Slide 13 The covariance MatrixDefinitionThe Covariation of DataThe covariance Matrix refers to the symmetric array of numbersS= s21s12s13 s1ps21s22s23 s2ps31s32s23 s2p wheres2j= (1/n) ni=1(xij xj)2is the variance of thej-th variablesjk= (1/n) ni=1(xij xj)(xik xk)is the covariance between thej-th andk-th variables xj= (1/n) ni=1xijis the mean of thej-th variableNathaniel E.

Helwig (U of Minnesota)Data, covariance , and Correlation MatrixUpdated 16-Jan-2017 : Slide 14 The covariance MatrixDefinitionCovariance Matrix from Data MatrixWe can calculate the covariance Matrix such asS=1nX cXcwhereXc=X 1n x =CXwith x = ( x1,.., xp)denoting the vector of variable meansC=In n 11n1 ndenoting a centering matrixNote that the centered matrixXchas the formXc= x11 x1x12 x2 x1p xpx21 x1x22 x2 x2p xpx31 x1x32 x2 x3p x1xn2 x2 xnp xp Nathaniel E. Helwig (U of Minnesota)Data, covariance , and Correlation MatrixUpdated 16-Jan-2017 : Slide 15 The covariance MatrixPropertiesVariances are NonnegativeVariances are sums-of-squares, which implies thats2j 0 >0 as long as there does not exist an such thatxj= 1nThis implies that.

Tr(S) 0 where tr( )denotes the Matrix trace function pj=1 j 0 where( 1,.., p)are the eigenvalues ofSIfn<p, then j=0 for at least onej {1,..,p}. Ifn pand thepcolumns ofXare linearly independent, then j>0 for allj {1,..,p}.Nathaniel E. Helwig (U of Minnesota)Data, covariance , and Correlation MatrixUpdated 16-Jan-2017 : Slide 16 The covariance MatrixPropertiesThe Cauchy-Schwarz InequalityFrom the Cauchy-Schwarz inequality we have thats2jk s2js2kwith the equality holding if and only ifxjandxkare linearly could also write the Cauchy-Schwarz inequality as|sjk| sjskwheresjandskdenote the standard deviations of the E. Helwig (U of Minnesota)Data, covariance , and Correlation MatrixUpdated 16-Jan-2017 : Slide 17 The covariance MatrixR CodeCovariance Matrix by Hand (hard way)> n <- nrow(X)> C <- diag(n) - Matrix (1/n, n, n)> Xc <- C %*% X> S <- t(Xc) %*% Xc / (n-1)> S[1:3,1:6]mpg cyl disp hp drat wtmpg # or #> Xc <- scale(X, center=TRUE, scale=FALSE)> S <- t(Xc) %*% Xc / (n-1)> S[1:3,1:6]mpg cyl disp hp drat wtmpg E.

Helwig (U of Minnesota)Data, covariance , and Correlation MatrixUpdated 16-Jan-2017 : Slide 18 The covariance MatrixR CodeCovariance Matrix usingcovFunction (easy way)# calculate covariance Matrix > S <- cov(X)> dim(S)[1] 11 11# check variance> S[1,1][1] > var(X[,1])[1] > sum((X[,1]-mean(X[,1]))^2) / (n-1)[1] # check covariance > S[1:3,1:6]mpg cyl disp hp drat wtmpg E. Helwig (U of Minnesota)Data, covariance , and Correlation MatrixUpdated 16-Jan-2017 : Slide 19 The Correlation MatrixThe Correlation MatrixNathaniel E. Helwig (U of Minnesota)Data, covariance , and Correlation MatrixUpdated 16-Jan-2017 : Slide 20 The Correlation MatrixDefinitionThe Correlation of DataThe Correlation Matrix refers to the symmetric array of numbersR= 1r12r13 r1pr211r23 r2pr31r321 1 whererjk=sjksjsk= ni=1(xij xj)(xik xk) ni=1(xij xj)2 ni=1(xik xk)2is the Pearson Correlation coefficient between E.

Helwig (U of Minnesota)Data, covariance , and Correlation MatrixUpdated 16-Jan-2017 : Slide 21 The Correlation MatrixDefinitionCorrelation Matrix from Data MatrixWe can calculate the Correlation Matrix such asR=1nX sXswhereXs=CXD 1withC=In n 11n1 ndenoting a centering matrixD=diag(s1,..,sp)denoting a diagonal scaling matrixNote that the standardized matrixXshas the formXs= (x11 x1)/s1(x12 x2)/s2 (x1p xp)/sp(x21 x1)/s1(x22 x2)/s2 (x2p xp)/sp(x31 x1)/s1(x32 x2)/s2 (x3p xp) (xn1 x1)/s1(xn2 x2)/s2 (xnp xp)/sp Nathaniel E. Helwig (U of Minnesota)Data, covariance , and Correlation MatrixUpdated 16-Jan-2017 : Slide 22 The Correlation MatrixPropertiesCorrelation of a Variable with Itself is OneAssuming thats2j>0 for allj {1.}

,p}, we have thatCor(xj,xk) = ni=1(xij xj)(xik xk) ni=1(xij xj)2 ni=1(xik xk)2={1ifj=krjkifj6=kBecauserjk=1 wheneverj=k, we know thattr(R) =pwhere tr( )denotes the Matrix trace function pj=1 j=pwhere( 1,.., p)are the eigenvalues ofRWe also know that the eigenvalues satisfy j=0 for at least onej {1,..,p}ifn<p j>0 jif columns ofXare linearly independentNathaniel E. Helwig (U of Minnesota)Data, covariance , and Correlation MatrixUpdated 16-Jan-2017 : Slide 23 The Correlation MatrixPropertiesThe Cauchy-Schwarz Inequality (revisited)Reminder: the Cauchy-Schwarz inequality implies thats2jk s2js2kwith the equality holding if and only ifxjandxkare linearly the terms, we have thats2jks2js2k 1 r2jk 1which implies that|rjk| 1 with equality holding if and only ifxj= 1n+ xkfor some scalars Rand E.

Data, Covariance, and Correlation Matrix

Tags:

Information

Transcription of Data, Covariance, and Correlation Matrix

Related search queries

Data, Covariance, and Correlation Matrix

Tags:

Information

Documents from same domain

Related documents

Related search queries