Chapter 4 Truncated Distributions

Chapter 4. Truncated Distributions This Chapter presents a simulation study of several of the con dence intervals rst presented in Chapter 2. Theorem on p. 50 shows that the ( , ). trimmed mean Tn is estimating a parameter T with an asymptotic variance 2. equal to W /( )2 . The rst ve sections of this Chapter provide the theory needed to compare the di erent con dence intervals. Many of these results will also be useful for comparing multiple linear regression estimators. Mixture Distributions are often used as outlier models. The following two de nitions and proposition are useful for nding the mean and variance of a mixture distribution. Parts a) and b) of Proposition below show that the de nition of expectation given in De nition is the same as the usual de nition for expectation if Y is a discrete or continuous random variable. Definition The distribution of a random variable Y is a mixture distribution if the cdf of Y has the form.

K FY (y) = i FWi (y) ( ). i=1.. where 0 < i < 1, ki=1 i = 1, k 2, and FWi (y) is the cdf of a continuous or discrete random variable Wi , i = 1, .., k. Definition Let Y be a random variable with cdf F (y). Let h be a function such that the expected value Eh(Y ) = E[h(Y )] exists. Then . E[h(Y )] = h(y)dF (y). ( ).. 104. Proposition a) If Y is a discrete random variable that has a pmf f(y) with support Y, then . Eh(Y ) = h(y)dF (y) = h(y)f(y). y Y. b) If Y is a continuous random variable that has a pdf f(y), then . Eh(Y ) = h(y)dF (y) = h(y)f(y)dy.. c) If Y is a random variable that has a mixture distribution with cdf FY (y) =. k i=1 i FWi (y), then . k Eh(Y ) = h(y)dF (y) = i EWi [h(Wi )]. i=1.. where EWi [h(Wi )] = . h(y)dFWi (y). Example Proposition implies that the pmf or pdf of Wi is used to compute EWi [h(Wi )]. As an example, suppose the cdf of Y is F (y) =.

(1 ) (y) + (y/k) where 0 < < 1 and (y) is the cdf of W1 N(0, 1). Then (y/k) is the cdf of W2 N(0, k 2 ). To nd EY, use h(y) = y. Then EY = (1 )EW1 + EW2 = (1 )0 + 0 = 0. To nd EY 2 , use h(y) = y 2 . Then EY 2 = (1 )EW12 + EW22 = (1 )1 + k 2 = 1 + k 2. Thus VAR(Y ) = E[Y 2 ] (E[Y ])2 = 1 + k 2 . If = and k = 10, then EY = 0, and VAR(Y ) = Remark Warning: Mixture Distributions and linear combinations of random variables are very di erent quantities. As an example, let W = (1 )W1 + W2. where , W1 and W2 are as in the previous example and suppose that W1. and W2 are independent. Then W , a linear combination of W1 and W2 , has a normal distribution with mean EW = (1 )EW1 + EW2 = 0. 105. and variance VAR(W ) = (1 )2 VAR(W1 ) + 2 VAR(W2) = (1 )2 + 2k 2 < VAR(Y ). where Y is given in the example above. Moreover, W has a unimodal normal distribution while Y does not follow a normal distribution.

In fact, if X1 . N(0, 1), X2 N(10, 1), and X1 and X2 are independent, then (X1 +X2 )/2 . N(5, ); however, if Y has a mixture distribution with cdf FY (y) = (y) + (y) = (y) + (y 10), then the pdf of Y is bimodal. Truncated Distributions can be used to simplify the asymptotic theory of robust estimators of location and regression. Sections , , , and will be useful when the underlying distribution is exponential, double exponential, normal, or Cauchy (see Chapter 3). Sections and exam- ine how the sample median, trimmed means and two stage trimmed means behave at these Distributions . De nitions and de ned the Truncated random variable YT (a, b). and the Winsorized random variable YW (a, b). Let Y have cdf F and let the Truncated random variable YT (a, b) have the cdf FT (a,b). The following lemma illustrates the relationship between the means and variances of YT (a, b) and YW (a, b).

Note that YW (a, b) is a mixture of YT (a, b) and two point masses at a and b. Let c = T (a, b) a and d = b T (a, b). Lemma Let a = T (a, b) c and b = T (a, b) + d. Then a). W (a, b) = T (a, b) c + (1 )d, and b). 2. W (a, b) = ( ) T2 (a, b) + ( 2 )c2. +[(1 ) (1 )2]d2 + 2 (1 )cd. c) If = 1 then 2. W (a, b) = (1 2 ) T2 (a, b) + ( 2 )(c2 + d2 ) + 2 2 cd. d) If c = d then 2. W (a, b) = ( ) T2 (a, b) + [ 2 + 1 (1 )2 + 2 (1 )]d2. 106. e) If = 1 and c = d, then W (a, b) = T (a, b) and 2. W (a, b) = (1 2 ) T2 (a, b) + 2 d2 . Proof. We will prove b) since its proof contains the most algebra. Now 2. W = ( T c)2 + ( )( T2 + 2T ) + (1 )( T + d)2 2W . Collecting terms shows that 2. W = ( ) T2 + ( + + 1 ) 2T + 2[(1 )d c] T. + c2 + (1 )d2 2W . From a), 2W = 2T + 2[(1 )d c] T + 2 c2 + (1 )2d2 2 (1 )cd, and we nd that 2. W = ( ) T2 + ( 2 )c2 + [(1 ) (1 )2]d2 + 2 (1 )cd.

QED. The Truncated Exponential Distribution Let Y be a (one sided) Truncated exponential T EXP ( , b) random variable. Then the pdf of Y is 1 y/ .. e fY (y| , b) =. 1 exp( b ). for 0 < y b where > 0. Let b = k , and let k . 1 y/ . ck = e dx = 1 e k . 0 . Next we will nd the rst two moments of Y T EXP ( , b = k ) for k > 0. Lemma If Y is T EXP ( , b = k ) for k > 0, then . 1 (k + 1)e k a) E(Y ) = , 1 e k 107. and . 2 2 1 12 (k 2 + 2k + 2)e k b) E(Y ) = 2 . 1 e k See Problem for a related result. Proof. a) Note that k . y y/ . ck E(Y ) = e dy 0 . k . y/ k . = ye |0 + e y/ dy 0. (use integration by parts). So ck E(Y ) =. k e k + ( e y/ )|k . 0. = k e k + (1 e k ). Hence . 1 (k + 1)e k E(Y ) = . 1 e k b) Note that . 2. k . y 2 y/ . ck E(Y ) = e dy. 0 . Since d [ (y 2 + 2 y + 2 2 )e y/ ]. dy 1 y/ 2. = e (y + 2 y + 2 2 ) e y/ (2y + 2 ).. 1. = y 2 e y/.

2. we have ck E(Y ) =. [ (y 2 + 2 y + 2 2 )e y/ ]k . 0. = (k 2 2 + 2 2 k + 2 2 )e k + 2 2 . So the result follows. QED. Since as k , E(Y ) , and E(Y 2) 2 2 , we have VAR(Y ) 2 . If k = 9 log(2) , then E(Y ) .998 , and E(Y 2 ) (2 2 ). 108. The Truncated Double Exponential Dis- tribution Suppose that X is a double exponential DE( , ) random variable. Chapter 3 states that MED(X) = and MAD(X) = log(2) . Let c = k log(2), and let the truncation points a = kMAD(X) = c and b = + kMAD(X) =. +c . Let XT (a, b) Y be the Truncated double exponential T DE( , , a, b). random variable. Then the pdf of Y is 1. fY (y| , , a, b) = exp( |y |/ ). 2 (1 exp( c)). for a y b. Lemma a) E(Y ) = .. 2 1 12 (c2 + 2c + 2)e c b) VAR(Y ) = 2 . 1 e c Proof. a) follows by symmetry and b) follows from Lemma b) since VAR(Y ) = E[(Y )2 ] = E(WT2 ) where WT is T EXP ( , b = c ). QED.

As c , VAR(Y ) 2 2 . If k = 9, then c = 9 log(2) and VAR(Y ) (2 2 ). The Truncated Normal Distribution Now if X is N( , 2) then let Y be a Truncated normal T N( , 2, a, b) random variable. Then fY (y) =. 2. 1. 2 2. exp ( (y ). 2 2. ). I[a,b](y). ( b .. ) ( a .. ). where is the standard normal cdf. The indicator function I[a,b](y) = 1 if a y b and is zero otherwise. Let be the standard normal pdf. 109. Lemma . ( a .. ) ( b .. ). E(Y ) = + , ( b .. ) ( a .. ). and VAR(Y ) =. 2. 2 ( a ) ( a ) ( b ) ( b ) ( a ) ( b ). 1+ . 2 .. ( ) ( ). b a . ( b .. ) ( a .. ). (See Johnson and Kotz 1970a, p. 83.). Proof. Let c =. 1.. ( ) ( a . b .. ). Then b E(Y ) = yfY (y)dy. a Hence b 1 y (y )2. E(Y ) = exp ( )dy c a 2 2 2 2. b y 1 (y )2. = ( ) exp ( )dy +. a 2 2 2. b 1 (y )2. exp ( )dy 2 a 2 2. b y 1 (y )2. = ( ) exp ( )dy a 2 2 2. b 1 (y )2. + exp ( )dy. a 2 2 2 2.

Note that the integrand of the last integral is the pdf of a N( , 2 ) distribution. Let z = (y )/ . Thus dz = dy/ , and E(Y )/c =. b . z 2 . e z /2dz +. a .. 2 c b . 2. = ( e z /2)| a .. + . 2 c 110. Multiplying both sides by c gives the expectation result. b 2. E(Y ) = y 2fY (y)dy. a Hence b 1 2 y2 (y )2. E(Y ) = exp ( )dy c a 2 2 2 2. b 2. y 2 y 2 1 (y )2. = ( 2 2 + 2 ) exp ( )dy a 2 2 2. b 2y 2 1 (y )2. + exp ( )dy a 2 2 2 2. b y 2 1 (y )2 2. = ( ) exp ( )dy + 2 E(Y ) . a 2 2 2 c c Let z = (y )/ . Then dz = dy/ , dy = dz, and y = z + . Hence E(Y 2 )/c =. b . 2 z2 2. 2 E(Y ) + e z /2dz. c c a .. 2 . 2 /2. Next integrate by parts with w = z and dv = ze z dz. Then E(Y 2 )/c =. 2. 2 E(Y ) +. c c b . 2 z2 /2. b 2. [( ze . )| a + e z /2dz]. 2 a .. 2.. 2 a a b b 1. = 2 E(Y ) + ( ) ( ) ( ) ( )+ . c c c Using 1. VAR(Y ) = c E(Y 2 ) (E(Y ))2. c gives the result.

QED. Corollary Let Y be T N( , 2 , a = k , b = + k ). Then E(Y ) = and VAR(Y ) =.. 2 2k (k). 1 . 2 (k) 1. 111. Table : Variances for Several Truncated Normal Distributions k VAR(Y ). 2. 2. 2. 2. 2. Proof. Use the symmetry of , the fact that ( x) = 1 (x), and the above lemma to get the result. QED. Examining VAR(Y ) for several values of k shows that the T N( , 2, a =. k , b = + k ) distribution does not change much for k > See Table The Truncated Cauchy Distribution If X is a Cauchy C( , ) random variable, then MED(X) = and MAD(X) =.. If Y is a Truncated Cauchy T C( , , a , + b ) random variable, then 1 1. fY (y) = 1 1. tan (b) + tan (a) [1 + ( y .. )2 ]. for a < y < + b . Moreover, log(1 + b2) log(1 + a2 ). E(Y ) = + , 2[tan 1 (b) + tan 1 (a)]. and VAR(Y ) =.. 2. b + a tan 1 (b) tan 1 (a) log(1 + b2 ) log(1 + a2). 2 . tan 1 (b) + tan 1 (a) tan 1 (b) + tan 1 (a).

Chapter 4 Truncated Distributions

Tags:

Information

Transcription of Chapter 4 Truncated Distributions

Related search queries

Chapter 4 Truncated Distributions

Tags:

Information

Documents from same domain

Related documents

Related search queries