Probability and Statistics Basics

Probability and Statistics BasicsKevin Kircher Cornell MAE Spring 14 These notes summarize some basic Probability and Statistics material. The primarysources areA Modern Introduction to Probability and Statisticsby Dekking, Kraaikamp,Lopuha a and Meester,Introduction to Probabilityby Dimitri Bertsekas, and the lectures ofProfs. Gennady Samorodnitsky and Mark Probability31 Outcomes, Events and Probability32 conditional Probability and Independence53 Discrete Random Variables74 Continuous Random Variables105 The Normal Distribution136 Expectation and Variance177 Joint Distributions and Independence198 Covariance and Correlation229 Random Vectors2410 Transformations of Random Variables2611 The Law of Large Numbers2912 Moment Generating Functions3113 conditional Distributions32114 Order Statistics3515 The Central Limit Theorem3716 stochastic Processes39II Statistics4217 Numerical Data Summaries4218 Basic Statistical Models4319 Unbiased Estimators4420 Precision of an Estimator4521 Maximum Likelihood Estimation4722 Method of Moments Estimation4823 Bayesian Estimation4924 Least Squares Estimation5125 Minimum Mean Squared Error Estimation5226 Hypothesis Testing532 Part IProbability1 Outcomes.

Events and ProbabilityDefinitions Asample space is a set of the outcomes of anexperiment. Aneventis a subset of the sample space. Two events A and B aredisjointif they have no elements (outcomes) in Nonnegativity:P(A) 0 for all events A Normalization:P( ) = 1 Disjoint Unions: for all disjoint events Ai,P(A1 A2 ..) =P(A1) +P(A2) +..Results DeMorgan s Laws. For any two events A and B,(A B)c=Ac Bc(A B)c=Ac BcMnemonic: distribute thecand flip the set operator. For unions of intersections and intersections of unions,A (B C) = (A B) (A C)A (B C) = (A B) (A C) The Probability of a union of (non-disjoint) events isP(A B) =P(A) +P(B) P(A B)Intuition: subtract the intersection of A and B to avoid double counting. For threeevents,P(A B C) =P(A) +P(B) +P(C) P(A B) P(A C) P(B C) +P(A B C)3 The Complement Rule:P(Ac) = 1 P(A) ApermutationPn,kis an ordering ofkobjects out of a pool ofn.

Such a permutationcan be done inPn,k=n!(n k)!ways. Acombination(nk)(pronounced nchoosek ) is a choice ofkobjects from a pool ofn, where order doesn t matter.(nk)=n!k!(n k)!Example: choosing 3 medalists out of a heat of 8 runners is a combination becauseorder doesn t matter. On the other hand, choosing the gold, silver and bronze medalistsis a permutation because order conditional Probability and IndependenceDefinitions Theconditional Probability of A given C(C is called theconditioning event), providedP(C)>0, isP(A|C) =P(A C)P(C)Note that the Complement Rule works for conditional probabilities. For all events A,P(A|C) +P(Ac|C) = 1 For three events A, B and C,P(A|B C) =P(A B|C)P(B|C) Events A and B areindependentif any of the following are true:P(A|B) =P(A)P(B|A) =P(B)P(A B) =P(A)P(B)where A can be replaced with Acor B with Bc.

All twelve of these statements areequivalent. Two or more events A1, A2,.., Amare independent ifP(A1 A2 Am) =P(A1)P(A2)..P(Am)andif the above equation also holds when any number of events are replaced by theircomplements., (A1 Ac2 A3 Am) =P(A1)P(Ac2)P(A3)..P(Am)In general, establishing the independence ofmevents requires checking useful rule: if eventsA1,..,Anare independent, then so are any derived eventsconstructed from disjoint groupings of The Multiplication Rule. For events A and C,P(A C) =P(A|C) P(C)5 Note that this works even ifP(C) = 0. This allows us to break the Probability of a com-plicated intersection up into a sequence of less complicated conditional for iterative general form of the Multiplication Rule, for eventsA1,..,Anwith positive prob-ability, isP( ni=1Ai) =P(A1)P(A2|A1)P(A3|A1 A2)..P(An| n 1i=1Ai) The Law of Total Probability .

For disjoint eventsC1,C2,..,Cmthat partition ,P(A) =P(A|C1)P(C1) +P(A|C2)P(C2) + +P(A|Cm)P(Cm)This allows us to write a probabilityP(A) as a weighted sum of conditional probabil-ities. Useful when the conditional probabilities are known or easy. A special case:P(B) =P(B|A)P(A) +P(B|Ac)P(Ac) Bayes Rule. For disjoint eventsC1,C2,..,Cmthat partition ,P(Ci|A) =P(A|Ci) P(Ci)P(A|C1)P(C1) +P(A|C2)P(C2) + +P(A|Cm)P(Cm)Note that we can also write Bayes Rule in a simpler form, and use the Law of TotalProbability to expand the denominator. This simpler form isP(Ci|A) =P(A|Ci) P(Ci)P(A)63 Discrete Random VariablesDefinitions Adiscrete random variableis a functionX: Rthat takes on a countable (possiblyinfinite, ifn ) number of discrete valuesx1,x2,..,xn. Theprobability mass functionpXof a discrete random variableXis the functionpX:R [0,1], defined bypX(xi) =P(X=xi).

Equivalently, for any setB,P(X B) = xi BpX(xi).Thepmfis non-zero only at the discrete valuesx1,x2,..More precisely, thepmfobeyspX(xi)>0 ipX(xi) = 1pX(x) = 0 for allx6=xi Thecumulative distribution functionFXof a discrete random variableXis the functionFX:R [0,1], defined byFX(x) =P(X x) forx RThecdfof a discrete RV is piecewise continuous from the right. For apmfdefined asabove,FXobeysFX(x) = xi xpX(xi)x1 x2 FX(x2) F(x2)limx + FX(x) = 1limx FX(x) = 0 Common Discrete Distributions Xhas theBernoulli distributionBer(p) with parameter 0 p 1 if itspmfis givenbypX(xi) = p,ifxi= 1,1 p,ifxi= 0,0,otherwise7 Expectation:EX=p. Variance: Var(X) =p(1 p).Bernoulli trials form the basis of all the most important discrete RVs. The Bernoullidistribution models a sequence of independent binary trials (coin flips), with probabilitypof success in each trial.

Xhas thebinomial distributionBin(n,p) with parametersn= 1,2,..and 0 p 1if itspmfis given bypX(k) =(nk)pk(1 p)n kfork= 0,1,.., :EX=np. Variance: Var(X) =np(1 p).The binomial RV counts the number of successes innBernoulli trials, with probabilitypof success in each Bernoulli RV is a special case of the binomial RV: Bin(1,p) is Ber(p). Themultinomial distributionMult(n,p1,..,pk) counts the number of times, out ofnindependent trials withktypes of outcome in every trial, that an outcome of typeiisobserved, fori {1,..,k}. Theithtype of outcome has probabilitypiof the number of times an outcome of typeiis observed, so that imi= the multinomial distribution ispM(m) =P(M1=m1,..,Mk=mk) =n!m1!..mk! that this gives the distribution of themultiplicitiesof outcomes of type 1 throughk. In Matlab, the RHS is easily computed with mnpdf(m,p).Fori {1.}

,k}, Expectation:EMi=npi Variance: Var(Mi) =npi(1 pi) Covariance: fori6=j, Cov(Mi,Mj) = (n,p) is Mult(n,p,1 p). Xhas anegative binomial distributionNB(n,p) with parametersn= 1,2,..and0 p 1 if itspmfis given bypX(k) =(k 1n 1)pn(1 p)k nfork=n,n+ 1,..Expectation:EX=n/p. Variance: Var(X) =n(1 p) negative binomial RV counts the number of trials until thenthsuccess, withprobabilitypof success in each Xhas thegeometric distributionGeo(p) with parameter 0< p 1 if itspmfis givenbypX(k) =p(1 p)k 1fork= 0,1,.., :EX= 1/p. Variance: Var(X) = (1 p) geometric RV is a special case of the negative binomial: NB(1,p) = Geo(p).The geometric RV counts the number of trials until the first geometric RV has thememoryless property. Xhas thehypergeometric distributionHyp(n,N,M) with parametersn= 1,2,..,N >0,andM >0 if itspmfis given bypX(k) =(Nk)(Mn k)(N+Mn),max (n M,0) k min (n,N)Expectation:EX=nN/(N+M).

Variance: Var(X) =nNM(N+M)2(1 n 1N+M+1).The hypergeometric RV counts the number of successes defined by choosing a whiteball inntrials of choosing a ballwithout replacementfrom an initial population ofNwhite balls andMblack balls. Xhas thePoisson distributionPoiss( ) with parameter if itspmfis given bypX(k) = kk!e ,fork= 0,1,2,..EX= Var(X) = .The Poisson RV is the limiting case of the binomial RV asn andp 0, whilethe productnp >0 (infinite trials, infinitesimal Probability of success per trial,but a finite product of the two).An example: in a huge volume of dough (n ), the Probability of scooping outany particular raisin is vanishingly small (p 0), but there s a constant raisin density(np ).Reproducing property:ifX1,..,Xnare Poisson RVs with parameters 1,.., n,then the sumY=X1+ +Xnis a Poisson RV with parameter 1+ + Continuous Random VariablesDefinitions A random variableXiscontinuousif for some functionfX:R RwithfX(x) 0for allxand fX(x)dx= 1, and for real numbersaandbwitha b,P(a X b) = bafX(x) particular,FX(a) = a fX(x) functionfXis called theprobability density function (pdf)ofX.

As in the discretecase,FXis called thecdfof X. For continuous RVXand for 0 p 1, thepthquantileor 100pthpercentileof thedistribution ofXis the smallest numberqpsuch thatFX(qp) =pThemedianof a distribution is its 50thpercentile. ThepdffXandcdfFXof a continuous random variableXare related byFX(b) = b fX(x)dxandfX(x) =ddxFX(x) Specifying the distributionof a RV X means identifying the characteristic that uniquelydetermines the probabilities associated with X. This can be done by specifying any oneof the following:1. Thecdfof X (works for RVs that are discrete, continuous or neither)2. Thepmf(discrete) orpdf(continuous) of X3. The name of a standard RV4. The moment generating function, X(t)5. The Laplace transform,LXt6. The characteristic function, X(t)10 Common Continuous Distributions Theuniform distributionon [a,b], Uni(a,b), is given by thepdffX(x) = 0 ifx / [a,b],andfX(x) =1b afora x bFX(x) =x ab aExpectation:EX= (a+b)/2.

Probability and Statistics Basics

Tags:

Information

Advertisement

Transcription of Probability and Statistics Basics

Related search queries

Probability and Statistics Basics

Tags:

Information

Advertisement

Documents from same domain

Related documents

Related search queries