1 Review of Probability - Columbia University

Copyrightc 2007 by Karl Sigman1 Review of ProbabilityRandom variables are denoted byX,Y,Z, etc. Thecumulative distribution function ( )of a random variableXis denoted byF(x) =P(X x), < x < , and if the randomvariable is continuous then its Probability density function is denoted byf(x) which is relatedtoF(x) viaf(x) =F (x) =ddxF(x)F(x) = x f(y) mass function ( )of a discrete random variable is given byp(k) =P(X=k), < k < ,for F(x) =P(X > x) is called thetailofXand is denoted byF(x) = 1 F(x). WhereasF(x) increases to 1 asx , and decreases to 0 asx , the tailF(x) decreases to 0 asx and increases to 1 asx .If a a certain distribution with (x) =P(X x), then we write, forsimplicity of expression,X F.

(1) Moments and varianceThe expected value of a is denote byE(X) and defined byE(X) = k= kp(k),discrete case,E(X) = xf(x)dx,continuous (X) is also referred to as thefirst momentor mean ofX(or of its distribution).Higher momentsE(Xn), n 1 can be computed viaE(Xn) = k= knp(k),discrete case,E(Xn) = xnf(x)dx,continuous case,and more generallyE(g(X)) for a functiong=g(x) can be computed viaE(g(X)) = k= g(k)p(k),discrete case,E(g(X)) = g(x)f(x)dx,continuous (Letingg(x) =xnyields moments for example.)Finally, the variance ofXis denoted byV ar(X), defined byE{|X E(X)|2}, and can becomputed viaV ar(X) =E(X2) E2(X),(2)the second moment minus the square of the first usually denote the variance by 2=V ar(X) and when necessary (to avoid confusion)includeXas a subscript, 2X=V ar(X).

= V ar(X) is called thestandard any any numberaE(aX) =aE(X),andV ar(aX) =a2V ar(X).(3)For any two (X+Y) =E(X) +E(Y).(4)IfXandYare independent, thenV ar(X+Y) =V ar(X) +V ar(Y).(5)The above properties generalize in the obvious fashion to to any finite number of general (independent or not)V ar(X+Y) =V ar(X) +V(Y) + 2 Cov(X, Y),whereCov(X, Y)def=E(XY) E(X)E(Y),is called thecovariancebetweenXandY, and is usually denoted by X,Y=Cov(X, Y).WhenCov(X, Y)>0,XandYare said to bepositively correlated, whereas whenCov(X, Y)<0,XandYare said to benegatively correlated. WhenCov(X, Y) = 0,XandYare said to beuncorrelated, and in general this is weaker than independence ofXandY:there are examplesof uncorrelated that are not independent.

Note in passing thatCov(X, X) =V ar(X).Thecorrelation coefficientofX, Yis defined by = X,Y X Y,and it always holds that 1 1. When = 1,XandYare said to be perfectly (positively) Moment generating functionsThe moment generating function (mgf) of a (or its distribution) is defined for alls ( , ) byM(s)def=E(esX)(6)= esxf(x)dx(= eskp(k) in the discrete case)It is so called because it generates the moments ofXby differentiation ats= 0:M (0) =E(X),(7)2and more generallyM(n)(0) =E(Xn), n 1.(8)The mgf uniquely determines a distribution in that no two distributions can have the samemgf. So knowing a mgf characterizes the distribution in independent, thenE(es(X+Y)) =E(esXesY) =E(esX)E(esY), and weconclude thatthe mgf of an independent sum is the product of the individual mgf to stress the particular , we writeMX(s).

Then the above independenceproperty can be concisely expressed asMX+Y(s) =MX(s)MY(s),whenXandYare a given distribution,M(s) = is possible for some values ofs, but thereis a large useful class of distributions for whichM(s)< for allsina neighborhood of theorigin, that is, fors ( , ) with >0 suffiently small. Such distributions are referredto aslight-tailedbecause their tails can be shown to tend to zero quickly. There also existsdistributions for which no such neighborhood exists and this can be so even if the distributionhas finite moments of all orders (see the lognormal distribution for example). A large class ofsuch distributions are referred to asheavy-tailedbecause their tails tend to zero non-negative , it is sometimes more common to use theLaplace trans - form ,L(s) =E(e sX), s 0, which is always finite, and then ( 1)nL(n)(0) =E(Xn), n discrete , it is sometimes more common to useM(z) =E(zX) = k= zkp(k),|z| 1for the mgf in which case moments can be generated viaM (1) =E(X),M (1) =E((X)(X 1)),M(n)(1) =E(X(X 1) (X (n 1))), n Examples of well-known distributionsDiscrete distribution with success probabilityp.

With 0< p <1 a constant,Xhas (k) =P(X=k) given byp(1) =p,p(0) = 1 p,p(k) = 0, takes on the values 1 (success) or 0 (failure).A simple computation yieldsE(X) =pV ar(X) =p(1 p)M(s) =pes+ 1 arise naturally as theindicator function,X=I{A}, of an eventA, whereI{A}def={1,if the eventAoccurs;0, (X= 1) =P(A) is the Probability that the eventAoccurs. For example, ifyou flip a coin once and letA={coin lands heads}, then forX=I{A},X= 1 if thecoin lands heads, andX= 0 if it lands tails. Because of this elementary and intuitivecoin-flipping example, a Bernoulli is sometimes referred to as a coin flip, wherepisthe Probability of landing the outcome of a Bernoulli is sometimes calledperforming a Bernoullitrial, or in the spirit of (1) we denote a byX Bern(p).}

Distribution with success probabilitypandntrials:If we consecutively performnindependent Bernoulliptrials,X1, .. , Xn, then the total number of successesX=X1+ +Xnyields the Binomial with (k) ={(nk)pk(1 p)n k,if 0 k n;0, our coin-flipping context, when consecutively flipping the coin exactlyntimes,p(k)denotes the Probability that exactlykof thenflips land heads (and hence exactlyn kland tails).A simple computation (utilizingX=X1+ +Xnand independence) yieldsE(X) =npV ar(X) =np(1 p)M(s) = (pes+ 1 p) in the spirit of (1) we denote a binomialn, byX bin(n, p). distribution with success probabilityp:The number of independent Bernoulliptrials required until the first success yields the geometric with (k) ={p(1 p)k 1,ifk 1;0, our coin-flipping context, when consecutively flipping the coin,p(k) denotes the prob-ability that thekthflip is the first flip to land heads (all previousk 1 flips land tails).}}

The tail ofXhas the nice formF(k) =P(X > k) = (1 p)k, k can be shown thatE(X) =1pV ar(X) =(1 p)p2M(s) =pes1 (1 p) (In fact, computingM(s) is straightforward and can be used to generate the mean andvariance.)Keeping in the spirit of (1) we denote a byX geom(p).Note in passing thatP(X > k) = (1 p)k, k a variation on the geometric, if we changeXto denote the number offailures before the first success, and denote this byY, then (since the first flip might bea success yielding no failures at all), the becomesp(k) ={p(1 p)k,ifk 0;0,otherwise,andp(0) =p. ThenE(Y) = (1 p)p 1andV ar(Y) = (1 p)p of the above arecalled the geometric distribution, and are related byY=X distribution with mean (and variance) :With >0 a constant,Xhas (k) ={e kk!}}

,ifk 0;0, Poisson distrubution has the interesting property that both its mean and varianceare identicalE(X) =V ar(X) = . Its mgf is given byM(s) =e (es 1).The Poisson distribution arises as an approximation to the binomial (n, p) distributionwhennis large andpis small: Letting =np,(nk)pk(1 p)n k e kk!,0 k in the spirit of (1) we denote a Poisson byX P oiss( ).Continuous distribution on(a, b):Withaandbconstants,Xhas density functionf(x) ={1b a; ifx (a, b)0,otherwise, (x) = x ab a,ifx (a, b);1,ifx b;0,ifx a,5and tailF(x) = b xb a,ifx (a, b);0,ifx b;1,ifx simple computation yieldsE(X) =a+b2V ar(X) =(b a)212M(s) =esb esas(b a).Whena= 0 andb= 1, this is known as theuniform distribution over the unit interval,and has densityf(x) = 1, x (0,1),E(X) = ,V ar(X) = 1/12,M(s) =s 1(es 1).}

Keeping in the spirit of (1) we denote a uniform (a, b) byX unif(a, b). distribution:With >0 a constant,Xhas density functionf(x) ={ e x,ifx 0;0,ifx <0, (x) ={1 e x,ifx 0;0,ifx <0,and tailF(x) ={e x,ifx 0;1,ifx <0,A simple computation yieldsE(X) =1 V ar(X) =1 2M(s) = exponential is famous for having the uniquememoryless property,P(X y > x|X > y) =P(X > x), x 0, y 0,in the sense that it is the unique distribution with this property.(The geometric distribution satisfies a discrete version of this.)Keeping in the spirit of (1) we denote an exponential byX exp( ).The exponential distribution can be viewed as approximating the distribution of thetimeuntil the first successwhen performing an independentBern(p) trial every tunits oftime withp= tand tvery small; as t 0, the approximation becomes distribution with mean and variance 2:N( , 2):The normal distribution isextremely important in applications because of the Central Limit Theorem (CLT).}}}

1 Review of Probability - Columbia University

Tags:

Information

Transcription of 1 Review of Probability - Columbia University

Related search queries

1 Review of Probability - Columbia University

Tags:

Information

Documents from same domain

Related documents

Related search queries