Example: barber

Basic Probability Theory (I)

Basic Probability Theory (I)Intro to Bayesian Data Analysis & Cognitive ModelingAdrian Brasoveanu[partly based on slides by Sharon Goldwater & Frank Keller and John K. Kruschke]Fall 2012 UCSC Linguistics1 Sample Spaces and EventsSample SpacesEventsAxioms and Rules of Probability2 Joint, Conditional and Marginal ProbabilityJoint and Conditional ProbabilityMarginal Probability3 Bayes Theorem4 Independence and Conditional Independence5 Random Variables and DistributionsRandom VariablesDistributionsExpectationTermino logyTerminology for Probability Theory : experiment:process of observation or measurement; ,coin flip; outcome:result obtained through an experiment; , coinshows tails; sample space:set of all possible outcomes of anexperiment; , sample space for coin flip:S={H,T}.Sample spaces can be finite or : Finite Sample SpaceRoll two dice, each with numbers 1 6. Sample space:S1={ x,y :x {1,2,..,6} y {1,2,..,6}}Alternative sample space for this experiment sum of the dice:S2={x+y:x {1,2.}}

A manufacturer knows that the probability of an order being ready on time is 0.80, and the probability of an order being ready on time and being delivered on time is 0.72.

Tags:

  Theory, Probability, Probability theory

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Advertisement

Transcription of Basic Probability Theory (I)

1 Basic Probability Theory (I)Intro to Bayesian Data Analysis & Cognitive ModelingAdrian Brasoveanu[partly based on slides by Sharon Goldwater & Frank Keller and John K. Kruschke]Fall 2012 UCSC Linguistics1 Sample Spaces and EventsSample SpacesEventsAxioms and Rules of Probability2 Joint, Conditional and Marginal ProbabilityJoint and Conditional ProbabilityMarginal Probability3 Bayes Theorem4 Independence and Conditional Independence5 Random Variables and DistributionsRandom VariablesDistributionsExpectationTermino logyTerminology for Probability Theory : experiment:process of observation or measurement; ,coin flip; outcome:result obtained through an experiment; , coinshows tails; sample space:set of all possible outcomes of anexperiment; , sample space for coin flip:S={H,T}.Sample spaces can be finite or : Finite Sample SpaceRoll two dice, each with numbers 1 6. Sample space:S1={ x,y :x {1,2,..,6} y {1,2,..,6}}Alternative sample space for this experiment sum of the dice:S2={x+y:x {1,2.}}

2 ,6} y {1,2,..,6}}S2={z:z {2,3,..,12}}={2,3,..,12}Example: Infinite Sample SpaceFlip a coin until heads appears for the first time:S3={H,TH,TTH,TTTH,TTTTH,..}EventsOf ten we are not interested in individual outcomes, but inevents. Aneventis a subset of a sample respect toS1, describe the eventBof rolling a total of 7with the two { 1,6 , 2,5 , 3,4 , 4,3 , 5,2 , 6,1 }EventsThe eventBcan be represented graphically: !!""##$$%%&&''(())**++,, ::;;<<==>>??@@AABBCCDDEEFFGGHHIIJJKKLLMMNNOOPPQQR RSSTT323451245616die 1die 2 EventsOften we are interested in combinations of two or more can be represented using set theoretic a sample spaceSand two eventsAandB: complementA (also A ):all elements ofSthat are not inA; subset A B:all elements ofAare also elements ofB; union A B:all elements ofSthat are inAorB; intersection A B:all elements ofSthat are operations can be represented graphically DiagramsABA AA BBAABA BA BAxioms of ProbabilityEvents are denoted by capital lettersA,B,C, etc.

3 Theprobabilityof an eventAis denoted byp(A).Axioms of Probability1 The Probability of an event is a nonnegative real number:p(A) 0 for anyA (S) = ,A2,A3,..,is a set of mutually exclusive events ofS,then:p(A1 A2 A3 ..) =p(A1) +p(A2) +p(A3) +.. Probability of an EventTheorem: Probability of an EventIfAis an event in a sample spaceSandO1,O2,..,On, are theindividual outcomes comprisingA, thenp(A) = ni=1p(Oi)ExampleAssume all strings of three lowercase letters are equallyprobable. Then what s the Probability of a string of threevowels?There are 26 letters, of which 5 are vowels. So there areN=263three letter strings, andn=53consisting only ofvowels. Each outcome (string) is equally likely, with probability1N, so eventA(a string of three vowels) has probabilityp(A) =nN=53263 of ProbabilityTheorems: Rules of Probability1 IfAandAare complementary events in the sample spaceS, thenp(A) =1 p(A).2p( ) =0 for any sample events in a sample spaceSandA B, thenp(A) p(B).

4 40 p(A) 1 for any RuleAxiom 3 allows us to add the probabilities of mutually exclusiveevents. What about events that are not mutually exclusive?Theorem: General Addition RuleIfAandBare two events in a sample spaceS, then:p(A B) =p(A) +p(B) p(A B)Ex:A= has glasses ,B= is blond .p(A) +p(B)counts blondes with glassestwice, need to subtract ProbabilityDefinition: Conditional Probability , Joint ProbabilityIfAandBare two events in a sample spaceS, andp(A)6=0then theconditional probabilityofBgivenAis:p(B|A) =p(A B)p(A)p(A B)is thejoint probabilityofAandB, also writtenp(A,B).Intuitively,p(B|A)is the Probability thatBwill occur given thatAhas : The Probability of being blond giventhat one wears glasses:p(blond|glasses).ABConditional ProbabilityExampleA manufacturer knows that the Probability of an order beingready on time is , and the Probability of an order beingready on time and being delivered on time is is the Probability of an order being delivered on time,given that it is ready on time?

5 R: order is ready on time;D: order is delivered on (R) = ,p(R,D) = Therefore:p(D|R) =p(R,D)p(R)= ProbabilityExampleConsider sampling an adjacent pair of words (bigram) from alarge textT. LetBI= the set of bigrams inT(this is our samplespace),A= first word isrun ={ run,w2 :w2 T} BIandB= second word isamok ={ w1,amok :w1 T} (A) =10 ,p(B) =10 , andp(A,B) =10 , what isthe Probability of seeingamokfollowingrun, ,p(B|A)? Howaboutrunprecedingamok, ,p(A|B)?p( runbeforeamok ) =p(A|B) =p(A,B)p(B)=10 ( amokafterrun ) =p(B|A) =p(A,B)p(A)=10 [How do we determinep(A),p(B),p(A,B)in the first place?](Con)Joint Probability and the Multiplication RuleFrom the definition of conditional Probability , we obtain:Theorem: Multiplication RuleIfAandBare two events in a sample spaceSandp(A)6=0,then:p(A,B) =p(A)p(B|A)SinceA B=B A, we also have that:p(A,B) =p(B)p(A|B)Marginal Probability and the Rule of Total ProbabilityTheorem: Marginalization ( Rule of Total Probability )If eventsB1,B2.

6 ,Bkconstitute a partition of the samplespaceSandp(Bi)6=0 fori=1,2,..,k, then for any eventAinS:p(A) =k i=1p(A,Bi) =k i=1p(A|Bi)p(Bi)B1,B2,..,Bkform apartitionofSif they arepairwise mutually exclusiveand ifB1 B2 .. Bk= an experiment on human memory, participants have tomemorize a set of words (B1), numbers (B2), and pictures (B3).These occur in the experiment with the probabilitiesp(B1) = ,p(B2) = ,p(B3) = participants have to recall the items (whereAis the recallevent). The results show thatp(A|B1) = ,p(A|B2) = ,p(A|B3) = Computep(A), the Probability of recalling the theorem of total Probability :p(A) = ki=1p(Bi)p(A|Bi)=p(B1)p(A|B1) +p(B2)p(A|B2) +p(B3)p(A|B3)= + + , Marginal & Conditional ProbabilityExampleProportions for a sample of University of Delaware students1974,N=592. Data adapted from Snee (1974). , Marginal & Conditional ProbabilityExampleThese are the joint probabilitiesp(eyeColor,hairColor).

7 , Marginal & Conditional ,p(eyeColor=brown,hairColor=brunette) =. , Marginal & Conditional ProbabilityExampleThese are the marginal probabilitiesp(eyeColor). , Marginal & Conditional ,p(eyeColor=brown) = hairColorp(eyeColor=brown,hairColor) =.12+.20+.01+.04=. , Marginal & Conditional ProbabilityExampleThese are the marginal probabilitiesp(hairColor). , Marginal & Conditional ,p(hairColor=brunette) = eyeColorp(eyeColor,hairColor=brunette) =.14+.20+.14=. , Marginal & Conditional ProbabilityExampleTo obtain the cond. (eyeColor|hairColor=brunette),we do two , Marginal & Conditional ProbabilityExampleTo obtain the cond. (eyeColor|hairColor=brunette),we do two : we consider only the probabilities in thebrunettecolumn; , Marginal & Conditional ProbabilityExampleTo obtain the cond. (eyeColor|hairColor=brunette),we do two : we divide by the marginalp(brunette),since all the Probability mass is now concentrated , Marginal & Conditional ,p(eyeColor=brown|hairColor=brunette) =.

8 20/. , Marginal & Conditional ProbabilityExampleMoreover:p(eyeColor=br own|hairColor=brunette)6=p(hairColor=bru nette|eyeColor=brown)Considerp(hairColor |eyeColor=brown) , Marginal & Conditional ProbabilityExampleTo obtainp(hairColor|eyeColor=brown), we reduce, we , Marginal & Conditional ProbabilityExampleSop(hairColor=brunette |eyeColor=brown) =.20/.37, (eyeColor=brown|hairColor=brunette) =.20/. Probability :p(A|B)vsp(B|A)Example 1: Disease Symptoms (from Lindley 2006) Doctors studying a disease D noticed that 90%of patientswith the disease exhibited a symptom S. Later, another doctor sees a patient and notices that sheexhibits symptom S. As a result, the doctor concludes that there is a 90%chance that the new patient has the disease : whilep(S|D) =.9,p(D|S)might be very Probability :p(A|B)vsp(B|A)Example 2: Forensic Evidence (from Lindley 2006) A crime has been committed and a forensic scientistreports that the perpetrator must have attributeP.

9 , theDNA of the guilty party is of typeP. The police find someone withP, who is charged with thecrime. In court, the forensic scientist reports that attributePonly occurs in a proportion of the population. Since is very small, the court infers that the defendant ishighly likely to be guilty, going on to assess the chance ofguilt as 1 since an innocent person would only have achance of : whilep(P|innocent) = ,p(innocent|P)might bemuch Probability :p(A|B)vsp(B|A)Example 3: Significance Tests (from Lindley 2006) As scientistis, we often set up a straw-man/null , we may suppose that a chemical has no effect on areaction and then perform an experiment which, if theeffect does not exist, gives numbers that are very small. If we obtain large numbers compared to expectation, wesay the null is rejected and the effect exists. Large means numbers that would only arise a smallproportion of times if the null hypothesis is true.

10 So we say that we have confidence 1 that the effectexists, and ( ) is the significance level of the : whilep(effect|null) = ,p(null|effect)mightbe Theorem:Relatingp(A|B)andp(B|A)We can infer something about a disease from a symptom, butwe need to do it with some care the proper inversion isaccomplished by the Bayes ruleBayes Theoremp(B|A) =p(A|B)p(B)p(A) Derived using mult. rule:p(A,B) =p(A|B)p(B) =p(B|A)p(A). Denominatorp(A)can be computed using theorem of totalprobability:p(A) =k i=1p(A|Bi)p(Bi). Denominator is a normalizing constant: ensuresp(B|A)sums to1. If we only care about relative sizes of probabilities, we canignore it:p(B|A) p(A|B)p(B).Bayes TheoremExampleConsider the memory example again. What is the probabilitythat an item that is correctly recalled (A) is a picture (B3)?By Bayes theorem:p(B3|A) =p(B3)p(A|B3) ki=1p(Bi)p(A|Bi)= process of computingp(B|A)fromp(A|B)is sometimescalledBayesian TheoremExampleA fair coin is flipped three times.


Related search queries