Example: marketing

Chapter 3 Total variation distance between measures

Chapter 3 Total variation distance betweenmeasures1. Why bother with different distances?When we work with a family of probability measures ,{P : }, indexedby a metric space , there would seem to be an obvious way to calculatethe distance between measures : use the metric on . For many problems ofestimation, the obvious is what we want. We ask how close (in the metric)we can come to guessing 0, based on an observation fromP 0; we compareestimators based on rates of convergence, or based on expected values of lossfunctions involving the distance from the parametrization is reasonable (whatever that means), distancesmeasured by the metric are reasonable.

4 Chapter 3: Total variation distance between measures If λ is a dominating (nonnegative measure) for which dµ/dλ = m and dν/dλ = n then d(µ∨ν) dλ = max(m,n) and d(µ∧ν) dλ = min(m,n) a.e. [λ]. In particular, the nonnegative measures defined by dµ +/dλ:= m and dµ−/dλ:= m− are the smallest measures for whichµ+A ≥ µA ≥−µ−A for all A ∈ A. Remark. Note that the ...

Tags:

  Measure, Variations

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Chapter 3 Total variation distance between measures

1 Chapter 3 Total variation distance betweenmeasures1. Why bother with different distances?When we work with a family of probability measures ,{P : }, indexedby a metric space , there would seem to be an obvious way to calculatethe distance between measures : use the metric on . For many problems ofestimation, the obvious is what we want. We ask how close (in the metric)we can come to guessing 0, based on an observation fromP 0; we compareestimators based on rates of convergence, or based on expected values of lossfunctions involving the distance from the parametrization is reasonable (whatever that means), distancesmeasured by the metric are reasonable.

2 (What else could I say?) Howeverit is not hard to concoct examples where the metric is misleading.<1> , denote the joint distribution fornindependent obser-vations from theN( ,1)distribution, with R. UnderPn, 0, the sampleaverage, Xn, converges to 0at an 1/2rate. The parametrization is happens if we reparametrize, replacing theN( ,1)by aN( 3,1)?We are still fitting the same model same probability measures , only thelabelling has changed. The maximum likelihood estimator, X1/3n, still convergesat ann 1/2rate if 0 =0, but for 0=0wegetann 1/6rate, as an artifact ofthe imaginative reparametrizations can produce even stranger behaviourfor the maximum likelihood estimator.

3 For example, define the one-to-onereparametriztion ( )= if is rational +1if is irrationalNow letPn, denote the joint distribution fornindependent observations fromtheN( ( ),1)distribution. If 0is rational, the maximum likelihood estimator, 1( Xn), gets very confused: it concentrates around 0 1asngets larger. You would be right to scoff at the second reparametrization in the Example,yet it does make the point that distances measured in the metric, for someparametrization picked out of the air, might not be particularly informativeabout the behaviour of estimators. Less ridiculous examples arise routinely in nonparametric problems, that is, in problems where infinite dimensional parameters enter, making the choice of metric less , there are intrinsic ways to measure distances between proba-bility measures , distances that don t depend on the parametrizations.

4 The restof this Chapter will set forth a few of the basic definitions and facts. The15 February 2005 Asymptopia, version: 15feb05c David Pollard12 Chapter 3: Total variation distance between measurestotal variation distancehas properties that will be familiar to students of theNeyman-Pearson approach to hypothesis testing. TheHellinger distanceisclosely related to the Total variation distance for example, both distances definethe same topology of the space of probability measures but it has severaltechnical advantages derived from properties of inner products. (Hilbert spaceshave nicer properties than general Banach spaces.)

5 For example, Hellingerdistances are very well suited for the study of product measures (Section 5).Also, Hellinger distance is closely related to the concept called Hellingerdifferentiability ( Chapter 6), an elegant alternative to the tradional assumptionsof pointwise differentiability in some asymptotic problems. Kullback-Leiblerdistance, which is also kown as relative entropy, emerges naturally from thestudy of maximum likelihood estimation. The relative entropy is not a metric,but it is closely related to the other two distances, and it too is well suited foruse with product measures .

6 See Section 5 and Chapter intrinsic measures of distance are the key to understanding minimaxrates of convergence, as you will learn in Chapter reasonable parametrizations, in classical finite-dimensional settings,the intrinsic measures usually tell the same story as the metric, as explainedin Chapter Total variation and lattice operationsIn classical analysis, the Total variation of a functionfover an interval [a,b]is defined asv(f,[a,b]):=supg ki=1|f(ti) f(ti 1)|,where the supremum runs over all finite gridsg:a=t0<t1< .. <tk=bon [a,b].The Total variation of a signed measure , on a sigma-fieldAof subsetsof someX, is defined analogously (Dunford & Schwartz 1958, Section ):v( ):=supg ki=1| Ai|,where now the supremum runs over all finite partitionsg:X= ki=1 AiofXinto simplest way to create a signed measure is by taking adifference 1 2of two nonnegative measures .

7 In fact, one of the keyproperties of signed measures with finite Total variation is that they canalways be written as such a fact, there is no need to consider partitions into more than two sets: for ifA= i{Ai: Ai 0}then i| Ai|= A Ac=| A|+| Ac|That is,v( ):=supA A | A|+| Ac| .If has a densitymwith respect to a countably additive, nonnegativemeasure then the supremum is achieved by the choiceA={m 0}:v( ):=supA A | Am|+| Acm| =| {m 0}m|+| {m<0}m|= |m|That is,v( )equals theL1( )norm of the densityd /d , for every choiceof dominating measure . This fact suggests the notation 1for the totalvariation of a signed measure .

8 215 February 2005 Asymptopia, version: 15feb05c David Total variation and lattice operations3 The Total variationv( )is also equal to sup|f| 1| f|, the supremumrunning over allA-measurable functionsfbounded in absolute value by ,| f|= |mf| |m|if|f| 1,with equality whenf={m 0} {m<0}.When (X)=0, there are some slight simplifications in the formulaeforv( ). In that case, 0= m= m+ m and hencev( )= 1=2 m+=2 m =2 {m 0}=2supA A AAs a special case, for probability measuresP1andP2, with densitiesp1andp2with respect to ,v( )= 1=2 (p1 p2)+=2 (p2 p1)+=2supA(P1A P2A)=2supA(P1A P2A)=2supA|P1A P2A|<2>Many authors, no doubt with the special case foremost in their minds, define thetotal variation as supA|P1A P2A|.

9 An unexpected extra factor of 2 can causeconfusion. To avoid this confusion I will abandon the notationv( )altogetherand write TVfor the modified summary, for a finite signed measure on a sigma-fieldA, withdensitymwith respect to a nonnegative measure , 1:= |m|=sup|f| 1| f|=supA A | A|+| Ac| If X=0then12 1= TV:=supA| A|=supA A= infA use of an arbitrarily chosen dominating measure also lets us performlattice operations on finite signed measures . For example, ifd /d =mandd /d =n, withm,n L1( ), then the measure defined byd /d :=m nhas the property that<3> A= (m n)A max A, A for allA fact, is the smallest measure with this property.

10 For suppose 0is a nothersigned measure with 0A max A, A for allA. We may assume, withno loss of generality, that 0is also dominated by , with densityg0. Theinequality mA{m n} = A{m n} 0A{m n}= g0{m n}A ,for allA A, impliesg0 [ ] on the set{m n}. Similarlyg0 [ ] on the set{m<n}. Thusg0 m [ ]and 0 , as though was defined via a particular choice of dominating measure ,the setwise properties show that the resulting mesure is the same for everysuch .<4> each pair of finite, signed measures and onA, there isa smallest signed measure for which( )(A) max A, A for allA Aand a largest signed measure for which( )(A) min A, A for allA A15 February 2005 Asymptopia, version: 15feb05c David Pollard34 Chapter 3: Total variation distance between measuresIf is a dominating (nonnegative measure ) for whichd /d =mandd /d =nthend( )d =max(m,n)andd( )d =min(m,n) [ ].


Related search queries