Basic tail and concentration bounds

C H A P T E R21 Basic tail and concentration bounds2In a variety of settings, it is of interest to obtain bounds onthe tails of a random3variable, or two-sided inequalities that guarantee that a random variable is close to its4mean or median. In this chapter, we explore a number of elementary techniques for5obtaining both deviation and concentration is an entrypoint to more6advanced literature on large deviation bounds and concentration of Classical bounds8 One way in which to control a tail probabilityP[X t] is by controlling the moments of9the random variableX. Gaining control of higher-order moments leads to correspond-10ingly sharper bounds on tail probabilities, ranging from Markov s inequality (which11requires only existence of the first moment) to the Chernoff bound (which requires12existence of the moment generating function).13 From Markov to Chernoff14 The most elementary tail bound isMarkov s inequality: given a non-negative randomvariableXwith finite mean, we haveP[X t] E[X]tfor allt >0.

( )For a random variableXthat also has a finite variance, we haveChebyshev s inequality:P |X | t var(X)t2for allt >0.( )Note that this is a simple form of concentration inequality,guaranteeing thatXis15close to its mean whenever its variance is small. Chebyshev s inequality follows by16applying Markov s inequality to the non-negative random variableY= (X E[X]) Markov s and Chebyshev s inequality are sharp, meaning that they cannot be18improved in general (see Exercise ).19 There are various extensions of Markov s inequality applicable to random variablesJanuary 22, 20150age12 CHAPTER 2. Basic TAIL AND concentration bounds with higher-order moments. For instance, wheneverXhas a central moment of orderk, an application of Markov s inequality to the random variable|X |kyields thatP[|X | t] E[|X |k]tkfor allt >0.( )Of course, the same procedure can be applied to functions other than polynomials|X |k.

For instance, suppose that the random variableXhas a moment generatingfunction in a neighborhood of zero, meaning that that there is some constantb >0such that the function ( ) =E[e (X )] exists for all |b|. In this case, for any [0;b], we may apply Markov s inequality to the random variableY=e (X ),thereby obtaining the upper boundP[(X ) t] =P[e (X ) e t] E[e (X )]e t:( )Optimizing our choice of so as to obtain the tightest result yields theChernoff bound namely, the inequalitylogP[(X ) t] sup [0;b] t logE[e (X )] :( )As we explore in Exercise , the moment bound ( ) with theoptimal choice ofkis1never worse than the bound ( ) based on the moment-generating function. Nonethe-2less, the Chernoff bound is most widely used in practice, possibly due to the ease of3manipulating moment generating functions. Indeed, a variety of important tail bounds4can be obtained as particular cases of inequality ( ), as we discuss in examples Sub-Gaussian variables and Hoeffding bounds7 The form of tail bound obtained via the the Chernoff approach depends on the growth8rate of the moment generating function.

Accordingly, in thestudy of tail bounds , it9is natural to classify random variables in terms of their moment generating reasons to become clear in the sequel, the simplest type ofbehavior is known as11sub-Gaussian. In order to motivate this notion, let us illustrate the use of the Chernoff12bound ( ) in deriving tail bounds for a Gaussian (Gaussian tail bounds ).LetX N( ; 2) be a Gaussian random vari-able with mean and variance 2. By a straightforward calculation, we find thatXhas the moment generating functionE[e X] =e + 2 2=2;valid for all R.( )ROUGH DRAFT: DO NOT DISTRIBUTE!! M. Wainwright January 22,2015 SECTION CLASSICAL BOUNDS13 Substituting this expression into the optimization problem defining the optimized Cher-noff bound ( ), we obtainsup R t logE[e (X )] = sup R t 2 22 =t22 2;where we have taken derivatives in order to find the optimum ofthis quadratic to the Chernoff bound ( ), we conclude that anyN( ; 2) random variablesatisfies theupper deviation inequalityP[X +t] e t22 2for allt 0.

( )In fact, this bound is sharp up to polynomial-factor corrections, as shown by the result1of Exercise 2 Motivated by the structure of this example, we are led to introduce the following random variableXwith mean =E[X] issub-Gaussianif thereis a positive number such thatE[e (X )] e 2 2=2for all R.( )67 The constant is referred to as thesub-Gaussian parameter; for instance, we say8thatXis sub-Gaussian with parameter when the condition ( ) holds. Naturally,9any Gaussian variable with variance 2is sub-Gaussian with parameter , as should10be clear from the calculation described in Example In addition, as we will see in11the examples and exercises to follow, a large number of non-Gaussian random variables12also satisfy the condition ( ).1314 The condition ( ), when combined with the Chernoff bound asin Example ,shows that ifXis sub-Gaussian with parameter , then it satisfies theupper deviationinequality( ).

Moreover, by the symmetry of the definition, the variable Xis sub-Gaussian if and only ifXis sub-Gaussian, so that we also have thelower deviationinequalityP[X t] e t22 2, valid for allt 0. Combining the pieces, we concludethat any sub-Gaussian variable satisfies theconcentration inequalityP[|X | t] 2e t22 2for allt R.( )15 Let us consider some examples of sub-Gaussian variables thatare DRAFT: DO NOT DISTRIBUTE! M. Wainwright January 22, 201514 CHAPTER 2. Basic TAIL AND concentration BOUNDSE xample (Rademacher variables).A Rademacher random variable takes thevalues{ 1;+1}equiprobably. We claim that it is sub-Gaussian with parameter = taking expectations and using the power series expansionfor the exponential, weobtainE[e ] =12 e +e =12 Xk=0( )kk!+ Xk=0( )kk! = Xk=0 2k(2k)! 1 + Xk=1 2k2kk!=e 2=2;which shows that is sub-Gaussian with parameter = 1 as claimed.

1We now generalize the preceding example to show that any bounded random variable2is also (Bounded random variables).LetXbe zero-mean, and supported onsome interval [a;b]. LettingX be an independent copy, for any R, we haveEX[e X] =EX e (X EX [X ]) EX;X e (X X ) ;where the inequality follows the convexity of the exponential, and Jensen s be an independent Rademacher variable, note that the distribution of (X X )is the same as that of (X X ), so that we haveEX;X e (X X ) =EX;X hE e (X X ) i(i) EX;X e 2(X X )22 ;where step (i) follows from the result of Example , applied conditionally with (X;X )held fixed. Since|X X | b a, we are guaranteed thatEX;X e 2(X X )22 e 2(b a)22:Putting together the pieces, we have shown thatXis sub-Gaussian with parameter at4most =b a. This result is useful but can be sharpened. In Exercise ,we work5through a more involved argument to show thatXis sub-Gaussian with parameter at6 ROUGH DRAFT: DO NOT DISTRIBUTE!

! M. Wainwright January 22,2015 SECTION CLASSICAL BOUNDS15most =b :The technique used in Example is a simple example of asymmetrization2argument, in which we first introduce an independent copyX , and then symmetrize3the problem with a Rademacher variable. Such symmetrization arguments are useful4in a variety of contexts, as will be seen in later as the property of Gaussianity is preserved by linear operations so is the prop-7erty of sub-Gaussianity. For instance, ifX1;X2are independent sub-Gaussian variables8with parameters 21and 22, thenX1+X2is sub-Gaussian with parameter 21+ 22. See9 Exercise for verification of this fact, and related properties. As a consequence of10this fact and the Basic sub-Gaussian tail bound ( ), we obtain an important result,11applicable to sums of independent sub-Gaussian random variables, and known as the12 Hoeffding bound:1314 Proposition (Hoeffding bound).

Suppose that the variablesXi,i= 1;:::;nareindependent, andXihas mean iand sub-Gaussian parameter i. Then for allt 0,we haveP nXi=1(Xi i) t exp( t22 Pni=1 2i):( )1516 The Hoeffding bound is often stated only for the special case of bounded random vari-ables. In particular, ifXi [a;b] for alli= 1;2;:::;n, then from the result of Exer-cise , it is sub-Gaussian with parameter =b a2, so that we obtain the boundP nXi=1(Xi i) t e 2t2n(b a)2:Although the Hoeffding bound is often stated in this form, thebasic idea applies some-17what more generally to sub-Gaussian variables, as we have given conclude our discussion of sub-Gaussianity with a resultthat provides three dif-20ferent characterizations of sub-Gaussian variables. First, the most direct way in which21to establish sub-Gaussianity is by computing or bounding the moment generating func-22tion, as we have done in Example A second intuition is that any sub-Gaussian23variable is dominated in a certain sense by a Gaussian variable.

Third, sub-Gaussianity24also follows by having suitably tight control on the momentsof the random following result shows that all three notions are equivalent in a precise DRAFT: DO NOT DISTRIBUTE! M. Wainwright January 22, 201516 CHAPTER 2. Basic TAIL AND concentration BOUNDST heorem (Equivalent characterizations of sub-Gaussian variables).Given anyzero-mean random variableX, the following properties are equivalent:(I) There is a constant such thatE[e X] e 2 22for all R.(II) There is a constantc 1 and Gaussian random variableZ N(0; 2) such thatP[|X| s] cP[|Z| s] for alls 0.( )(III) There exists a number 0 such thatE[X2k] (2k)!2kk! 2kfor allk= 1;2;::::( )(IV) We haveE[e X22 2] 1 1 for all [0;1).( )12 See Appendix A for the proof of these Sub-exponential variables and Bernstein bounds5 The notion of sub-Gaussianity is fairly restrictive, so that it is natural to consider vari-6ous relaxations of it.]

Basic tail and concentration bounds

Information

Advertisement

Transcription of Basic tail and concentration bounds

Related search queries

Basic tail and concentration bounds

Information

Advertisement

Documents from same domain

Related documents

Related search queries