Example: barber

7. Convergence in Probability - Pennsylvania State University

7. Convergence in Probability Lehmann ; Ferguson 1. Here, we consider sequences X1 , X2 , .. of random variables instead of real numbers. As with real numbers, we'd like to have an idea of what it means to converge. In general, Convergence will be to some limiting random variable. However, this random variable might be a constant, so it also makes sense to talk about Convergence to a real number. There are several different modes of Convergence . We begin with Convergence in Probability . P. Definition The sequence {Xn } converges in Probability to X, written Xn X, if for every > 0, P (|Xn X| < ) 1 (6). as n . P. If the random variable X in expression (6) is a constant, say P (X = c) = 1, then we write Xn c if for every > 0, P (|Xn c| < ) 1 (7). as n . Note that the form in expression (6) requires that the joint distribution of Xn and X be known!

7. Convergence in Probability Lehmann §2.1; Ferguson §1 Here, we consider sequences X 1,X 2,... of random variables instead of real numbers.As with real numbers, we’d like to have an idea of what it means to converge.

Tags:

  Probability, Convergence, Convergence in probability

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of 7. Convergence in Probability - Pennsylvania State University

1 7. Convergence in Probability Lehmann ; Ferguson 1. Here, we consider sequences X1 , X2 , .. of random variables instead of real numbers. As with real numbers, we'd like to have an idea of what it means to converge. In general, Convergence will be to some limiting random variable. However, this random variable might be a constant, so it also makes sense to talk about Convergence to a real number. There are several different modes of Convergence . We begin with Convergence in Probability . P. Definition The sequence {Xn } converges in Probability to X, written Xn X, if for every > 0, P (|Xn X| < ) 1 (6). as n . P. If the random variable X in expression (6) is a constant, say P (X = c) = 1, then we write Xn c if for every > 0, P (|Xn c| < ) 1 (7). as n . Note that the form in expression (6) requires that the joint distribution of Xn and X be known!

2 This is not the case for expression (7), and in fact (6) is almost never used in Lehmann's book. P. Definition If the sequence of estimators { n } satisfies n g( ), we say that n is consistent for g( ). Theorem Chebyshev's inequality: For any a > 0 and r > 0, r E |X Y | ar P (|X Y | a) . (8). Note that if we take r = 2 and Y to be the constant E (X), then we obtain Var (X) a2 P {|X E (X)| a} , (9). which is what many textbooks refer to as Chebyshev's inequality. Definition For r > 0, Xn converges in the rth mean to X if E |Xn X|r 0 as n . r We write Xn X. As a special case, we say that Xn converges in quadratic mean to X, qm Xn X, if E (Xn X)2 0. qm P. Theorem If Xn X, then Xn X. Theorem Weak Law of Large Numbers: If X1 , X2 , .. are independent and identically dis- P. tributed (iid) with mean , and if X n denotes the sample mean of X1.

3 , Xn , then X n . Example These are special cases of the WLLN: 1. If X1 , X2 , .. are iid with E |X1 |k < , then n 1X k P. X E X1k . n i=1 i That is, sample moments are consistent. 13. P. 2. If X binomial(n, p), then X/n p. Different sequences of convergent in Probability sequences may be combined in much the same way as their real-number counterparts: P P P. Theorem If Xn X and Yn Y and f is continuous, then f (Xn , Yn ) f (X, Y ). If X = a and Y = b are constant random variables, then f only needs to be continuous at (a, b). Thus, the sum of the limits equals the limit of the sums, the product of the limits equals the limit of the products, etc. qm Theorem For a constant c, Xn c if and only if E Xn c and Var Xn 0. P qm Beware, however! Xn c does not imply E Xn c or Xn c or Var Xn 0. Consider the following sequence of random variables.

4 N with Probability 1/n Xn =. 0 with Probability 1 1/n. P. Then Xn 0 (do you see why?). But E (Xn ) = 1 and Var (Xn ) = n 1. There are analogues of the o, , and O notations that may be applied to random variables. Definition Let {Xn } and {Yn } be sequences of random variables. P. We write Xn = oP (Yn ) if Xn /Yn 0. We write Xn P Yn if for every > 0, there exist m, M , and N such that 0 < m < M < . and . Xn P m< < M > 1 for all n > N . Yn We write Xn = OP (Yn ) if Xn = oP (Yn ) or Xn P Yn . Definition We say that Xn is bounded in Probability if Xn = OP (1). The concept of bounded in Probability sequences will come up a bit later (see Definition and the following discussion on pages 64 65 in Lehmann). Problems Problem (a) Prove Theorem , Chebyshev's inequality. Use only the expectation operator (no integrals or sums).

5 Hint: Define Z = aI{|Y X| a}, and consider what can be said about the expectation of Zr. (b) Prove that inequality (9) is tight by giving an example of a random variable X and a positive constant a for which equality holds. Problem Prove Theorem Problem Prove the weak law of large numbers (Theorem ) for the case in which Var Xi < . 14. Problem Prove Theorem Problem Prove or disprove this statement: If there exists M such that P (|Xn | < M ) = 1 for P qm all n, then Xn c implies Xn c. Problem These are three small results used to prove other theorems. (a) Do problem on p. 119. (b) Prove that if 0 < r < s and E |X|s < , then E |X|r < 1 + E |X|s . Pn (c) Prove that if n c, where c may be finite or infinite, then i=1 i /n c. 15. 8. Regression estimators, sample mean for non-iid sequences Lehmann Example Suppose that X1 , X2.

6 , instead of being iid, are independent but E Xi = and Var Xi = i2 . We try to determine when X n is consistent for . Since E X n = for all n, clearly Theorems and imply that X n is consistent whenever Var X n 0 (note, however, that the converse is not true; see Problem ). The variance of P Pn X n is easy to write down, and thus we conclude that X n if i=1 i2 = o(n2 ). On the other hand, if instead of X n we consider the BLUE (best linear unbiased estimator). Pn Xi / i2. n = Pi=1. n 2 , (10). j=1 1/ j then as before E n = , but 1. Var n = Pn . j=1 1/ j2. Example Consider the case of simple linear regression: Yi = 0 + 1 zi + i , where the zi are known covariates and the i are independent with mean 0 and variance 2 . If we define zi z 1. wi = Pn 2. and vi = zwi , (z j=1 j z) n then the least squares estimators of 0 and 1 are n X n X.

7 0n = vi Yi and 1n = wi Yi , i=1 i=1. respectively. Since E Yi = 0 + 1 zi , we have n X n X. E 0n = 0 + 1 z 0 z wi 1 z wi zi i=1 i=1. and n X. E 1n = 0 w + 1 wi zi . i=1. Pn Pn Note that i=1 wi = 0 and i=1 wi zi = 1. Therefore, E 0n = 0 and E 1n = 1 , which is to say that E 0n and E 1n are unbiased. Therefore, by Theorem , a sufficient condition for the consistency of E 0n and E 1n is that their variances tend to zero as n . It is easy to write down Var 0n and Var 1n explicitly, since Var Yi = 2 : n X n X. Var 0n = 2 vi2 and Var 1n = 2 wi2 . i=1 i=1. 16. Pn nP o 1. 2 n 2. With i=1 wi = j=1 (zi z) , these expressions simplify to 2 2 z2 2. Var 0n = + Pn 2. and Var 1n = Pn 2.. n j=1 (zi z) j=1 (zi z). Therefore, 0n and 1n are consistent if z2 1. Pn 2. and Pn 2. , j=1 (zi z) j=1 (zi z). respectively, tend to zero.

8 Suppose that X1 , X2 , .. have E Xi = but they are not independent. Then clearly E X n = and so X n is consistent for if Var X n 0. In this case, n n 1 XX. Var X n = Cov (Xi , Xj ). (11). n2 i=1 j=1. Definition The sequence X1 , X2 , .. is stationary if the joint distribution of (Xi , .. , Xi+k ). does not depend on i for any i > 0 and k 0. For a stationary sequence, Cov (Xi , Xj ) depends only on the gap j i; for example, Cov (X1 , X4 ) =. Cov (X2 , X5 ) = Cov (X5 , X8 ) = . Therefore, the variance expression of equation (11) becomes n 1. 2 2 X. Var X n = + 2 (n k) Cov (X1 , X1+k ). (12). n n k=1. A sufficient condition for expression (12) to go to zero is that 2 < and Cov (X1 , X1+k ) 0 as k . We now prove this fact. It is clear that 2 /n 0 if 2 < . Assuming that Cov (X1 , X1+k ) 0, select > 0 and note that for N.

9 Chosen so that | Cov (X1 , X1+k )| < /2 for all n > N , we have n 1 N n 1. 2 X 2X 2 X. (n k) Cov (X 1 , X 1+k ) |Cov (X 1 , X 1+k )| + |Cov (X1 , X1+k )| . n2 n n k=1 k=1 k=N +1. Note that the second term on the right is strictly less than /2, and the first term is a constant divided by n, which may be made smaller than /2 by choosing n large enough. This proves that 2 and Cov (X1 , X1+k ) . 0 together imply that expression (12) goes to zero. Definition The sequence X1 , X2 , .. is m-dependent for some nonnegative integer m if (X1 , .. , Xi ). and (Xj , Xj+1 , ..) are independent whenever j i > m. Note that any iid sequence is a stationary, 0-dependent sequence. Also note that any stationary m-dependent sequence obviously satisfies Cov (X1 , X1+k ) 0 as k , so X n is consistent for any stationary m- dependent sequence.

10 Problems Problem Suppose X1 , X2 , .. are independent with E Xi = and Var Xi = i2 . P. (a) Give an example in which X n but Var X n does not converge to 0. (b) Prove that, for n defined as in equation (10), Var n Var X n and give an example ( , specify the values of i2 ) in which Var n 0 but Var X n . 17. Problem Suppose X1 , X2 , .. are iid with E(Xi ) = and Var (Xi ) = 2 < . Let Yi = X i =. Pi ( j=1 Xj )/i. Pn (a) Prove that Y n = ( i=1 Yi )/n is a consistent estimator of . (b) Compute the relative efficiency eY n ,X n of Y n to X n for n {5, 10, 20, 50, 100, } and report the results in a table similar to Table on p. 58. Note that n = in the table does not actually mean n = , since there is no such real number; instead, n = is shorthand for the limit (of the efficiency) as n . Problem Let Y1 , Y2 , .. be iid with mean and variance 2 <.


Related search queries