Transcription of Basic tail and concentration bounds
1 C H A P T E R21 Basic tail and concentration bounds2In a variety of settings, it is of interest to obtain bounds onthe tails of a random3variable, or two-sided inequalities that guarantee that a random variable is close to its4mean or median. In this chapter, we explore a number of elementary techniques for5obtaining both deviation and concentration is an entrypoint to more6advanced literature on large deviation bounds and concentration of Classical bounds8 One way in which to control a tail probabilityP[X t] is by controlling the moments of9the random variableX. Gaining control of higher-order moments leads to correspond-10ingly sharper bounds on tail probabilities, ranging from Markov s inequality (which11requires only existence of the first moment) to the Chernoff bound (which requires12existence of the moment generating function).13 From Markov to Chernoff14 The most elementary tail bound isMarkov s inequality: given a non-negative randomvariableXwith finite mean, we haveP[X t] E[X]tfor allt >0.
2 ( )For a random variableXthat also has a finite variance, we haveChebyshev s inequality:P |X | t var(X)t2for allt >0.( )Note that this is a simple form of concentration inequality,guaranteeing thatXis15close to its mean whenever its variance is small. Chebyshev s inequality follows by16applying Markov s inequality to the non-negative random variableY= (X E[X]) Markov s and Chebyshev s inequality are sharp, meaning that they cannot be18improved in general (see Exercise ).19 There are various extensions of Markov s inequality applicable to random variablesJanuary 22, 20150age12 CHAPTER 2. Basic TAIL AND concentration bounds with higher-order moments. For instance, wheneverXhas a central moment of orderk, an application of Markov s inequality to the random variable|X |kyields thatP[|X | t] E[|X |k]tkfor allt >0.( )Of course, the same procedure can be applied to functions other than polynomials|X |k.
3 For instance, suppose that the random variableXhas a moment generatingfunction in a neighborhood of zero, meaning that that there is some constantb >0such that the function ( ) =E[e (X )] exists for all |b|. In this case, for any [0;b], we may apply Markov s inequality to the random variableY=e (X ),thereby obtaining the upper boundP[(X ) t] =P[e (X ) e t] E[e (X )]e t:( )Optimizing our choice of so as to obtain the tightest result yields theChernoff bound namely, the inequalitylogP[(X ) t] sup [0;b] t logE[e (X )] :( )As we explore in Exercise , the moment bound ( ) with theoptimal choice ofkis1never worse than the bound ( ) based on the moment-generating function. Nonethe-2less, the Chernoff bound is most widely used in practice, possibly due to the ease of3manipulating moment generating functions. Indeed, a variety of important tail bounds4can be obtained as particular cases of inequality ( ), as we discuss in examples Sub-Gaussian variables and Hoeffding bounds7 The form of tail bound obtained via the the Chernoff approach depends on the growth8rate of the moment generating function.
4 Accordingly, in thestudy of tail bounds , it9is natural to classify random variables in terms of their moment generating reasons to become clear in the sequel, the simplest type ofbehavior is known as11sub-Gaussian. In order to motivate this notion, let us illustrate the use of the Chernoff12bound ( ) in deriving tail bounds for a Gaussian (Gaussian tail bounds ).LetX N( ; 2) be a Gaussian random vari-able with mean and variance 2. By a straightforward calculation, we find thatXhas the moment generating functionE[e X] =e + 2 2=2;valid for all R.( )ROUGH DRAFT: DO NOT DISTRIBUTE!! M. Wainwright January 22,2015 SECTION CLASSICAL BOUNDS13 Substituting this expression into the optimization problem defining the optimized Cher-noff bound ( ), we obtainsup R t logE[e (X )] = sup R t 2 22 =t22 2;where we have taken derivatives in order to find the optimum ofthis quadratic to the Chernoff bound ( ), we conclude that anyN( ; 2) random variablesatisfies theupper deviation inequalityP[X +t] e t22 2for allt 0.
5 ( )In fact, this bound is sharp up to polynomial-factor corrections, as shown by the result1of Exercise 2 Motivated by the structure of this example, we are led to introduce the following random variableXwith mean =E[X] issub-Gaussianif thereis a positive number such thatE[e (X )] e 2 2=2for all R.( )67 The constant is referred to as thesub-Gaussian parameter; for instance, we say8thatXis sub-Gaussian with parameter when the condition ( ) holds. Naturally,9any Gaussian variable with variance 2is sub-Gaussian with parameter , as should10be clear from the calculation described in Example In addition, as we will see in11the examples and exercises to follow, a large number of non-Gaussian random variables12also satisfy the condition ( ).1314 The condition ( ), when combined with the Chernoff bound asin Example ,shows that ifXis sub-Gaussian with parameter , then it satisfies theupper deviationinequality( ).
6 Moreover, by the symmetry of the definition, the variable Xis sub-Gaussian if and only ifXis sub-Gaussian, so that we also have thelower deviationinequalityP[X t] e t22 2, valid for allt 0. Combining the pieces, we concludethat any sub-Gaussian variable satisfies theconcentration inequalityP[|X | t] 2e t22 2for allt R.( )15 Let us consider some examples of sub-Gaussian variables thatare DRAFT: DO NOT DISTRIBUTE! M. Wainwright January 22, 201514 CHAPTER 2. Basic TAIL AND concentration BOUNDSE xample (Rademacher variables).A Rademacher random variable takes thevalues{ 1;+1}equiprobably. We claim that it is sub-Gaussian with parameter = taking expectations and using the power series expansionfor the exponential, weobtainE[e ] =12 e +e =12 Xk=0( )kk!+ Xk=0( )kk! = Xk=0 2k(2k)! 1 + Xk=1 2k2kk!=e 2=2;which shows that is sub-Gaussian with parameter = 1 as claimed.
7 1We now generalize the preceding example to show that any bounded random variable2is also (Bounded random variables).LetXbe zero-mean, and supported onsome interval [a;b]. LettingX be an independent copy, for any R, we haveEX[e X] =EX e (X EX [X ]) EX;X e (X X ) ;where the inequality follows the convexity of the exponential, and Jensen s be an independent Rademacher variable, note that the distribution of (X X )is the same as that of (X X ), so that we haveEX;X e (X X ) =EX;X hE e (X X ) i(i) EX;X e 2(X X )22 ;where step (i) follows from the result of Example , applied conditionally with (X;X )held fixed. Since|X X | b a, we are guaranteed thatEX;X e 2(X X )22 e 2(b a)22:Putting together the pieces, we have shown thatXis sub-Gaussian with parameter at4most =b a. This result is useful but can be sharpened. In Exercise ,we work5through a more involved argument to show thatXis sub-Gaussian with parameter at6 ROUGH DRAFT: DO NOT DISTRIBUTE!
8 ! M. Wainwright January 22,2015 SECTION CLASSICAL BOUNDS15most =b :The technique used in Example is a simple example of asymmetrization2argument, in which we first introduce an independent copyX , and then symmetrize3the problem with a Rademacher variable. Such symmetrization arguments are useful4in a variety of contexts, as will be seen in later as the property of Gaussianity is preserved by linear operations so is the prop-7erty of sub-Gaussianity. For instance, ifX1;X2are independent sub-Gaussian variables8with parameters 21and 22, thenX1+X2is sub-Gaussian with parameter 21+ 22. See9 Exercise for verification of this fact, and related properties. As a consequence of10this fact and the Basic sub-Gaussian tail bound ( ), we obtain an important result,11applicable to sums of independent sub-Gaussian random variables, and known as the12 Hoeffding bound:1314 Proposition (Hoeffding bound).
9 Suppose that the variablesXi,i= 1;:::;nareindependent, andXihas mean iand sub-Gaussian parameter i. Then for allt 0,we haveP nXi=1(Xi i) t exp( t22 Pni=1 2i):( )1516 The Hoeffding bound is often stated only for the special case of bounded random vari-ables. In particular, ifXi [a;b] for alli= 1;2;:::;n, then from the result of Exer-cise , it is sub-Gaussian with parameter =b a2, so that we obtain the boundP nXi=1(Xi i) t e 2t2n(b a)2:Although the Hoeffding bound is often stated in this form, thebasic idea applies some-17what more generally to sub-Gaussian variables, as we have given conclude our discussion of sub-Gaussianity with a resultthat provides three dif-20ferent characterizations of sub-Gaussian variables. First, the most direct way in which21to establish sub-Gaussianity is by computing or bounding the moment generating func-22tion, as we have done in Example A second intuition is that any sub-Gaussian23variable is dominated in a certain sense by a Gaussian variable.
10 Third, sub-Gaussianity24also follows by having suitably tight control on the momentsof the random following result shows that all three notions are equivalent in a precise DRAFT: DO NOT DISTRIBUTE! M. Wainwright January 22, 201516 CHAPTER 2. Basic TAIL AND concentration BOUNDST heorem (Equivalent characterizations of sub-Gaussian variables).Given anyzero-mean random variableX, the following properties are equivalent:(I) There is a constant such thatE[e X] e 2 22for all R.(II) There is a constantc 1 and Gaussian random variableZ N(0; 2) such thatP[|X| s] cP[|Z| s] for alls 0.( )(III) There exists a number 0 such thatE[X2k] (2k)!2kk! 2kfor allk= 1;2;::::( )(IV) We haveE[e X22 2] 1 1 for all [0;1).( )12 See Appendix A for the proof of these Sub-exponential variables and Bernstein bounds5 The notion of sub-Gaussianity is fairly restrictive, so that it is natural to consider vari-6ous relaxations of it.]