Lecture 6: Discrete Random Variables - CMU Statistics

Lecture 6: Discrete Random Variables19 September 20051 ExpectationThe expectation of a Random variable is its average value, with weights in theaverage given by the probability distributionE[X] = xPr (X=x)xIfcis a constant,E[c] = constants,E[aX+b] =aE[X] + Y, thenE[X] E[Y]Now let s think aboutE[X+Y].E[X+Y] = x,y(x+y)Pr (X=x,Y=y)= x,yxPr (X=x,Y=y) + x,yyPr (X=x,Y=y)= xx yPr (X=x,Y=y) + yy xPr (X=x,Y=y)by total probability, xPr (X=x,Y=y) = Pr (X=x), likewise xPr (X=x,Y=y) =Pr (Y=y). So,E[X+Y] = xxPr (X=x) + yyPr (Y=y)=E[X] +E[Y]Notice thatE[X] works just like a mean; in fact we can think of it as beingthe population mean (as opposed to the sample mean).

The variance is the expectation of (X E[X]) (X) = xp(x)(x E[X])21which we can show isE[X2] (E[X]) (X) =E[(X E[X])2]=E[X2 2XE[X] + (E[X])2]=E[X2] E[2XE[X]] +E[(E[X])2]NowE[X] is just another constant, soE[(E[X])2]= (E[X])2, andE[2XE[X]] =2E[X]E[X] = 2(E[X])2. SoVar (X) =E[X2] 2(E[X])2+ (E[X])2=E[X2] (E[X])2as main rule for variance is this:Var (aX+b) =a2 Var (X)It s not generally true that Var (X+Y) = Var (X) + Var (Y); we ll see whenit s true Some useful resultsA basic result about expectations is theMarkov inequality: ifXis a non-negative Random variable , andais a positive constant, thenPr (X a) E[X]aProof: LetA={X a}.

SoX a1A: either 1A= 0, in which caseX 0, orelse 1A= 1, but thenX a. SoE[X] E[a1A] =aE[1A] =aPr (X a).TheChebyshev inequalityis a special case of the Markov inequality, buta very useful one. It s plain that (X E[X])2 0, so applying the Markovinequality givesPr((X E[X])2 a2) Var (X)a2 Taking the square root of the term inside the left-hand side,Pr (|X E[X]| a) Var (X)a2 The Chebyshev inequality helps give meaning to the variance: it tells us abouthow unlikely it is for the Random variable to depart very far from its Independent ll say that Random Variables are independent if their probability distributionsfactor, Pr (X=x,Y=y) = Pr (X=x) Pr (Y=y).

If the Variables are independent, thenE[XY] =E[X]E[Y].E[XY] = x,yxyPr (X=x,Y=y)= x yxyPr (X=x,Y=y)= x yxyPr (X=x) Pr (Y=y)= xxPr (X=x) yyPr (Y=y)= xxPr (X=x)E[Y]=E[Y] xxPr (X=x)=E[Y]E[X]This isn t the only time thatE[XY] =E[X]E[Y], s where independence gets important: what s the variance ofX+Y?Var (X+Y) =E[(X+Y)2] (E[X+Y])2=E[X2+ 2XY+Y2] (E[X] +E[Y])2=E[X2]+ 2E[XY] +E[Y2] [(E[X])2+ 2E[X]E[Y] + (E[Y])2]=E[X2] (E[X])2+E[Y2] (E[Y])2+ 2E[XY] 2E[X]E[Y]= Var (X) + Var (Y) + 2E[XY] 2E[X]E[Y]But we ve just seen thatE[XY] =E[X]E[Y] ifXandYare independent, sothenVar (X+Y) = Var (X) + Var (Y)3 Binomial Random variablesRecall that the distribution of the binomial isProbX=x=(nx)px(1 p)n xand that it s the sum ofnindependent Bernoulli Variables with do we know this is a valid probability distribution?

It s clearly 0 forallx, but how do I know it sums to 1? Because of the binomial theorem fromalgebra (which is where the name comes from).(a+b)n=n k=0(nk)akbn k(p+ (1 p))n=n k=0(nk)pk(1 p)n k1n=n k=0(nk)pk(1 p)n k1 =n k=0(nk)pk(1 p)n kTo find the mean and variance, we could either do the appropriate sumsexplicitly, which means using ugly tricks about the binomial formula; or we coulduse the fact thatXis a sum ofnindependent Bernoulli Variables . Because theBernoulli Variables have expectationp,E[X] =np. Because they have variancep(1 p), Var (X) =np(1 p).

4 Geometric Random variablesSuppose we keep trying independent Bernoulli Variables until we have a success;each has probability of successp. Then the probability that the number offailures iskis (1 p)kp. (Be careful, some people usepas the probability offailure here, they reversepand 1 p.)First, check that this weird thing is a valid probability distribution doesit sum to one? Yes: k=0(1 p)kp=p k=0(1 p)k=p11 (1 p)=p1p= 1 This uses the geometric series, the fact that pk= 1/(1 p), ifpis between 0and let s think about the [X] = k=0k(1 p)kp=p k=1k(1 p)k=p1p2=1pSimilarly (but more involvedly) the variance is (1 p) Negative binomial Random variablesInstead of just getting one success, we might keep going until we getrof probability distribution then is just Pr (X=k) =(k 1r 1)pr(1 p)

K r,k we think ofW1as the number of trials we have to make to get the first success,and thenW2the number of further trials to the second success, and so on, wecan see thatX=W1+W2+..+Wr, and that theWiare independent andgeometric Random Variables . SoE[X] =r/p, and Var (X) =r(1 p) Poisson Random variablesThink about a very large number of Bernoulli trials, wheren , but theexpected number of successes stays constant, say . For instance, suppose we relooking at the number of particles emitted by a chunk of radioactive substanceover one-second intervals of time.

Every atom has a certain probability to decayover a given unit of time; as we make the time intervals smaller, we make thoseprobabilities smaller, but the average total should still come to the same we have only a finiten, butnis very large, sop= (X=k) =(nk)pk(1 p)k=n!k!(n k)!pk(1 p)k(1 p)nSincenis large, we can use Stirling s approximation onn! and (n k)!, son! nnand (n k)! (n k)n k nn (X=k) nkk!pk(1 p)k(1 n)n kk!e because lim (1 +x/n)n= can check that the probability adds up to one, because k=0 kk!=e We can also get the mean:E[X] = k=0k kk!

E = k=1k kk!e = k=1 k 1(k 1)!e 5= k=1 k 1(k 1)!e = k=0 kk!e = The easiest way to get the variance is to first calculateE[X(X 1)], becausethis will let us use the same sort of trick about factorials and the exponentialfunction [X(X 1)] = k=0k(k 1) kk!e E[X2 X]= k=2 k(k 2)!e E[X2] E[X] = 2 k=2 k 2(k 2)!e = 2 k=0 kk!e = 2 SoE[X2]=E[X] + (X) =E[X2] (E[X])2=E[X] + 2 2=E[X] = 6 Adding Many Independent Random VariablesRemember the Chebyshev inequality:Pr (|X E[X]| a) Var (X)a2 Let s look at the sum of a whole bunch of independent Random Variables withthe same distribution,Sn= ni= know thatE[Sn] =E[ ni=1Xi] = E[Xi] =nE[X1], because they allhave the same expectation.

Because they re independent, and all have the samevariance, Var (Sn) =nVar (X1). SoPr (|Sn nE[X1]| a) nVar (X1)a26 Now, notice thatSn/nis just the sample mean, if we draw a sample of we can use the Chebyshev inequality to estimate the chance that the samplemean is far from the true, population mean, which is the ( Snn E[X1] )= Pr (|Sn nE[X1]| n ) nVar (X1)n2 2 Var (X1)n 2 Observe that whatever is, the probability must go to zero like 1/n(or faster).So the probability that the sample mean differs from the population mean byas much as can be made arbitrarily small, by taking a large enough is thelaw of large

Lecture 6: Discrete Random Variables - CMU Statistics

Tags:

Information

Transcription of Lecture 6: Discrete Random Variables - CMU Statistics

Related search queries

Lecture 6: Discrete Random Variables - CMU Statistics

Tags:

Information

Documents from same domain

Related documents

Related search queries