Example: stock market

Box-Cox Transformation: An Overview

Box-Cox Transformation: An Overview Box-Cox Transformations: An Overview Pengfei Li Department of Statistics, University of Connecticut Apr 11, 2005. Pengfei Li Apr 11,2005. Box-Cox Transformation: An Overview Introduction Since the seminal paper by Box and Cox(1964), the Box-Cox type of power transformations have generated a great deal of interests, both in theoretical work and in practical applications. In this presentation, I intend to go over the following topics: What are the Box-Cox power transformations? The inference on the transformations parameter. Some cautionary notes on using the Box-Cox transformations. Pengfei Li Apr 11,2005. Box-Cox Transformation: An Overview What are the Box-Cox power transformations? The original form of the Box-Cox transformation, as appeared in their 1964 paper, takes the following form: . y 1 , if 6= 0;.. y( ) =. log y, if = 0. In the same paper, they also proposed an extended form which could accommodate negative y's.

Box-Cox Transformation: An Overview The aim of the Box-Cox transformations is to ensure the usual assumptions for Linear Model hold. That is, y ∼ N(Xβ,σ2In) Clearly not all data could be power-transformed to Normal.

Tags:

  Power

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Box-Cox Transformation: An Overview

1 Box-Cox Transformation: An Overview Box-Cox Transformations: An Overview Pengfei Li Department of Statistics, University of Connecticut Apr 11, 2005. Pengfei Li Apr 11,2005. Box-Cox Transformation: An Overview Introduction Since the seminal paper by Box and Cox(1964), the Box-Cox type of power transformations have generated a great deal of interests, both in theoretical work and in practical applications. In this presentation, I intend to go over the following topics: What are the Box-Cox power transformations? The inference on the transformations parameter. Some cautionary notes on using the Box-Cox transformations. Pengfei Li Apr 11,2005. Box-Cox Transformation: An Overview What are the Box-Cox power transformations? The original form of the Box-Cox transformation, as appeared in their 1964 paper, takes the following form: . y 1 , if 6= 0;.. y( ) =. log y, if = 0. In the same paper, they also proposed an extended form which could accommodate negative y's.

2 (y+ 2 ) 1 1 , if 6= 0;. 1 1. y( ) =. log(y + 2 ), if 1 = 0. Here, = ( 1 , 2 ) . In practice, we could choose 2 such that y + 2 > 0 for any y. So, we could only view 1 as the model parameter. Pengfei Li Apr 11,2005. Box-Cox Transformation: An Overview The aim of the Box-Cox transformations is to ensure the usual assumptions for Linear Model hold. That is, y N(X , 2 In ). Clearly not all data could be power -transformed to Normal. Draper and Cox (1969) studied this problem and conclude that even in cases that no power -transformation could bring the distribution to exactly normal, the usual estimates of will lead to a distribution that satisfies certain restrictions on the first 4. moments, thus will be usually symmetric. One example in Draper and Cox(1969) is the following: Suppose that the raw data are from an Exp(1000) distribution. The estimate of is 3 values that are close to are chosen to perform the transformation: 1 = , 2 = , 3 = Such transformations result in 3 Weibull distributions: Weib(5,1000), Weib( ,1000) and Weib( ,1000).

3 Pengfei Li Apr 11,2005. Box-Cox Transformation: An Overview The following are Q-Q Normal plots for a random sample of size 500 from Exp(1000) distribution. Q Q Normality plot for original data Q Q Normality plot for lambda= 5000. 5. Sample Quantiles Sample Quantiles 4. 3000. 3. 0 1000. 2. 3 2 1 0 1 2 3 3 2 1 0 1 2 3. Exp(1000) Weibull(5,1000). Q Q Normality plot for lambda= Q Q Normality plot for lambda= 20. 10. Sample Quantiles Sample Quantiles 15. 8. 6. 10. 4. 5. 2. 3 2 1 0 1 2 3 3 2 1 0 1 2 3. Weibull( ,1000) Weibull( ,1000). Pengfei Li Apr 11,2005. Box-Cox Transformation: An Overview Since the work of Box and Cox(1964), there have been many modifications proposed. Manly(1971) proposed the following exponential transformation: . e y 1 , if 6= 0;.. y( ) =. y, if = 0. Negative y's could be allowed. The transformation was reported to be successful in transform unimodal skewed distribution into normal distribution, but is not quite useful for bimodal or U-shaped distribution.

4 Pengfei Li Apr 11,2005. Box-Cox Transformation: An Overview John and Draper(1980) proposed the following modification which they called Modulus Transformation .. Sign(y) (|y|+1) 1 , if 6= 0;.. y( ) =. Sign(y) log(|y| + 1), if = 0, where . 1, if y 0;. Sign(y) =. 1, if y < 0. Negative y's could be allowed. It works best at those distribution that is somewhat symmetric. A power transformation on a symmetric distribution is likely going to introduce some degree of skewness. Pengfei Li Apr 11,2005. Box-Cox Transformation: An Overview Bickel and Doksum(1981) gave the following slight modification in their examination of the asymptotic performance of the parameters in the Box-Cox transformations model: |y| Sign(y) 1. y( ) = , for > 0, . where . 1, if y 0;. Sign(y) =. 1, if y < 0. Pengfei Li Apr 11,2005. Box-Cox Transformation: An Overview Yeo and Johnson(2000) made a case for the following transformation: . (y+1) 1.. , if 6= 0, y 0;.. log(y + 1), if = 0, y 0.

5 Y( ) = (1 y)2 1.. , if 6= 2, y < 0;.. 2.. log(1 y), if = 2, y < 0. When estimating the transformation parameter, they found the value of that minimizes the Kullback-Leibler distance between the normal distribution and the transformed distribution. Pengfei Li Apr 11,2005. Box-Cox Transformation: An Overview The inference on the transformation parameter The main objective in the analysis of Box-Cox transformation model is to make inference on the transformation parameter , and Box and Cox(1964) considered two approaches. The first approach is to use the Maximum Likelihood method. This method is commonly used since it's conceptually easy and the profile likelihood function is easy to compute in this case. Also it's easy to obtain an approximate CI for because of the asymptotic property of MLE. Pengfei Li Apr 11,2005. Box-Cox Transformation: An Overview We assume that transformed responses y( ) N(X , 2 In ). We observe the design matrix X and the raw data y, and the model parameters are ( , , 2 ).

6 The density for the y( ) is exp( 2 1 2 (y( ) X ) (y( ) X )). f (y( )) = n . (2 2 ) 2. Let J( , y) be the Jacobian of the transformation from y to y( ), then the density for y (which is also the likelihood for the whole model) is 1 . 2 exp( 2 2 (y( ) X ) (y( ) X )). L( , , |y, X) = f (y) = 2 n J( , y). (2 ) 2. Pengfei Li Apr 11,2005. Box-Cox Transformation: An Overview To obtain the MLE from the last likelihood equation, we observe that for each fixed , the likelihood equation is proportional to the likelihood equation for estimating ( , 2 ) for observed y( ). Thus the MLE's for ( , 2 ) are . ( ) = (X X) Xy( ), y( ) (In G)y( ). 2 ( ) = , n where G = ppo(X) = X(X X) X .. Substitute ( ) and 2 ( ) into the likelihood equation, and note that for the original form of the Box-Cox transformation, Qn J( , y) = i=1 yi 1 , we could obtain the profile log likelihood( , the likelihood function maximized over ( , 2 )) for alone. Pengfei Li Apr 11,2005. Box-Cox Transformation: An Overview .

7 LP ( ) = l( |y, X, ( ), 2 ( )). Xn n = C log( 2 ( )) + ( 1) log(yi ). 2 i=1. Let g be the geometric mean of the response vector( , Qn g = ( i=1 yi ) n ), also let y( , g) = gy( ). 1. 1 . Then it's easy to see n lP ( ) = C log(s2 ), 2. where s2 is the residual sum of squares divided by n from fitting the linear model y( , g) N(X , 2 In ). So to maximize the profile log-likelihood, we only need to find a that minimizes 2 y( ,g) (In G)y( ,g). s = n . Pengfei Li Apr 11,2005. Box-Cox Transformation: An Overview Without any further effort and just use the standard likelihood methods, we could easily give a likelihood ratio test. For test lP ( 0 )]. H0 : = 0 , the test statistic is W = 2[lP ( ). Asymptotically W is distributed as 21 . Carefully note that W is a function of both the data (through ) and 0 . A large sample CI for is easily obtainable by inverting the be the MLE of , then an approximate likelihood ratio test. Let . (1 )100% CI for is SSE( ).

8 { | n log( ) 21 (1 )}, . SSE( ). where SSE( ) = y( , g) (In G)y( , g). The accuracy of the approximation is given by the following fact: 12. P (W 21 (1 )) = 1 + O(n ). Pengfei Li Apr 11,2005. Box-Cox Transformation: An Overview It's also not hard to derive a test using Rao's score statistic. Atkinson(1973) first proposed a score-type statistic for test H0 : = 0 , although the derivation were not based on likelihood theory. Lawrence(1987) modified the result by Atkinson(1973), by employing the standard likelihood theory. Pengfei Li Apr 11,2005. Box-Cox Transformation: An Overview The second approach outlined in Box and Cox(1964) is to use Bayesian method. In this approach, we need to first ensure that the model is fully identifiable. If X is not of full column rank, then is not estimable(or more accurately identifiable). So we further assume X is n p matrix and rank(X) = r(r p). Now using the full-rank factorization to write X = AR(Nalini and Day, p40, result ), it's easy to reparameterize the model as y( ) N(A , 2 In ), where A : n r is of full column rank and = R is itself estimable.

9 Pengfei Li Apr 11,2005. Box-Cox Transformation: An Overview We now consider the prior distribution for the parameters ( , , 2 ). Box and Cox(1964) propose the following prior 1 1. 1 ( , , ) ( ) ( 1)r , g where g is the geometric mean of response vector y, and ( ) is some prior distribution for only. Pericchi(1981) considered another joint prior distribution 1. 2 ( , , ) ( ) , r+1. again ( ) is some prior distribution for only. So what's the rationale of choosing such prior distributions? Pengfei Li Apr 11,2005. Box-Cox Transformation: An Overview When = 1( , no transformation performed), the is a location parameter and is a scale parameter, so the natural non-informative prior for and should be uniform and 1. respectively. This implies 1. 1 ( = 1, , ) = p( , | = 1) ( = 1) ( = 1) .. Box and Cox(1964) then assumes that the transformation is approximately linear over the range of observations, that is E(yi ( )) a + b E(yi ), where b is some representative of the gradient dy( ).

10 Dy . This implies that when 6= 1, each element of is multiplies by a scale of b . So the prior for when 6= 1 should be |b 1|r . Pengfei Li Apr 11,2005. Box-Cox Transformation: An Overview 1. Box and Cox(1964) chose b = J( , y) n = g 1 , which they admitted that such choice was somewhat arbitrary . This gives the Box-Cox version of the prior distribution. Pericchi(1981) followed exactly the same argument, with the exception that the use of Jefferys' prior for ( , ) instead of invariant non-informative prior. Clearly the Box and Cox's prior is outcome-dependent , which seems to be an undesirable property. Pengfei Li Apr 11,2005. Box-Cox Transformation: An Overview It's not hard to see that the posterior distribution for Box and Cox prior is 1 b A A( ). S + ( ) b ( 1)(n r). 1 ( , , |y, A) n+1 exp( 2. ) g ( ), 2 . (y( ) A ). where S = (y( ) A ) . The posterior distribution for Pericchi's prior is 1 b A A( ). S + ( ) b ( 1)n 2 ( , , |y, A) n+r+1 exp( 2.)