Example: stock market

Lecture 4: Random Variables and Distributions

Lecture 4: RandomVariables and DistributionsGoals Working with Distributions in R Overview of discrete and continuousdistributions important in genetics/genomics Random VariablesRandom Variables ! "01-1A rv is any rule ( , function) that associatesa number with each outcome in the samplespaceTwo Types of Random Variables A discrete Random variable has acountable number of possible values A continuous Random variable takes allvalues in an interval of numbersProbability Distributions of RVsDiscreteLet X be a discrete rv. Then the probability mass function (pmf), f(x),of X is:! f(x)=P(X = x), x 0,x Continuous! P(a"X"b)=f(x)dxab#Let X be a continuous rv. Then the probability density function (pdf) ofX is a function f(x) such that for any two numbers a and b with a b:abAaUsing CDFs to Compute ProbabilitiesContinuous rv:!

•Before data is collected, we regard observations as random variables (X 1,X 2,…,X n) •This implies that until data is collected, any function (statistic) of the observations (mean, sd, etc.) is also a random variable •Thus, any statistic, because it is a random variable, has a probability distribution - referred to as a sampling ...

Tags:

  Variable, Probability, Random, Random variables

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Lecture 4: Random Variables and Distributions

1 Lecture 4: RandomVariables and DistributionsGoals Working with Distributions in R Overview of discrete and continuousdistributions important in genetics/genomics Random VariablesRandom Variables ! "01-1A rv is any rule ( , function) that associatesa number with each outcome in the samplespaceTwo Types of Random Variables A discrete Random variable has acountable number of possible values A continuous Random variable takes allvalues in an interval of numbersProbability Distributions of RVsDiscreteLet X be a discrete rv. Then the probability mass function (pmf), f(x),of X is:! f(x)=P(X = x), x 0,x Continuous! P(a"X"b)=f(x)dxab#Let X be a continuous rv. Then the probability density function (pdf) ofX is a function f(x) such that for any two numbers a and b with a b:abAaUsing CDFs to Compute ProbabilitiesContinuous rv:!

2 F(x)=P(X"x)=f(y)dy#$x%pdfcdf! P(a"X"b)=F(b)#F(a)Using CDFs to Compute ProbabilitiesContinuous rv:! F(x)=P(X"x)=f(y)dy#$x%pdfcdf! P(a"X"b)=F(b)#F(a)Expectation of Random VariablesContinuous! X=E[X]=x"f(x)dx#$$%The expected or mean value of a continuous rv X with pdf f(x) is:DiscreteLet X be a discrete rv that takes on values in the set D and has apmf f(x). Then the expected or mean value of X is:! X=E[X]=x"f(x)x#D$Variance of Random VariablesContinuous! "X2=V[X]=(x# )2$f(x)dx#%%&=E[(X# )2]The variance of a continuous rv X with pdf f(x) and mean is:DiscreteLet X be a discrete rv with pmf f(x) and expected value . Thevariance of X is:! "X2=V[X]=(x#x$D% )2=E[(X# )2]Example of Expectation and Variance Let L1, L2, .., Ln be a sequence of n nucleotides and define the rvXi:1, if Li = A 0, otherwiseXi pmf is then: P(Xi = 1) = P(Li = A) = pAP(Xi = 0) = P(Li = C or G or T) = 1 - pA E[X] = 1 x pA + 0 x (1 - pA) = pA Var[X] = E[X - ]2 = E[X2] - 2= [12 x pA + 02 x (1 - pA)] - pA2= pA (1 - pA)The Distributions We ll Study1.

3 Binomial Distribution2. Hypergeometric Distribution3. Poisson Distribution4. Normal DistributionBinomial Distribution Experiment consists of n trials , 15 tosses of a coin; 20 patients; 1000 people surveyed Trials are identical and each can result inone of the same two outcomes , head or tail in each toss of a coin Generally called success and failure probability of success is p, probability of failure is 1 p Trials are independent Constant probability for each observation , probability of getting a tail is the same each time wetoss the coinBinomial Distribution! P{X=x}=()px(1"p)n"xnxpmf:E(x) = npcdf:! P{X"x}=()py(1#p)n#yy=0x$nyVar(x) = np(1-p)Binomial Distribution: Example 1 A couple, who are both carriers for a recessivedisease, wish to have 5 children.

4 They want to knowthe probability that they will have four healthy kids! P{X=4}=() " (x)Binomial Distribution: Example 2 Wright-Fisher model: There are i copies of the A allelein a population of size 2N in generation t. What is thedistribution of the number of A alleles in generation t+ 1?! i2N" # $ % & ' j1(i2N" # $ % & ' 2N(j2 Npij =jj = 0, 1, .., 2 NHypergeometric Distribution Population to be sampled consists of Nfinite individuals, objects, or elements Each individual can be characterized as asuccess or failure, m successes in thepopulation A sample of size k is drawn and the rv ofinterest is X = number of successesHypergeometric Distribution Similar in spirit to Binomial distribution, but from a finitepopulation without replacement20 white ballsout of100 ballsIf we randomly sample 10 balls, what is the probability that 7or more are white?))

5 Hypergeometric Distribution pmf of a hypergeometric rv:! P{X=i|n,m,k}=mi n k - im + n kFor i = 0, 1, 2, 3, ..Where,m = Number of balls in urn considered success k = Number of balls selected n = Number of balls in urn considered failure m + n = Total number of balls in urnHypergeometric Distribution Extensively used in genomics to test for enrichment :! "= Number of annotated genesNumber ofgenes ofinterestNumber ofgenes withannotationNumber of genes of interest withannotationPoisson Distribution Useful in studying rare events Poisson distribution also used in situationswhere events happen at certain pointsin time Poisson distribution approximates thebinomial distribution when n is large and pis smallPoisson Distribution! P{X=i}=e"##ii! A rv X follows a Poisson distribution if the pmf of X is:For i = 0, 1, 2, 3.

6 Is frequently a rate per unit time: Safely approximates a binomial experiment when n > 100, p < , np = < 20) E(X) = Var(X) = = t = expected number of events per unit time tPoisson RV: Example 1! P{X=i}=e"ddii! The number of crossovers, X, between twomarkers is X ~ poisson( =d)! P{X=0}=e"d! P{X"1}=1#e#dPoisson RV: Example 2 Recent work in Drosophila suggests the spontaneous rate ofdeleterious mutations is ~ per diploid genome. Thus, let stentatively assume X ~ poisson( = ) for humans. What isthe probability that an individual has 12 or more spontaneousdeleterious mutations?! P{X"12}=1#e# !i=011$= x 10-9 Poisson RV: Example 3 Suppose that a rare disease has an incidence of 1 in 1000 peopleper year. Assuming that members of the population are affectedindependently, find the probability of k cases in a population of10,000 (followed over 1 year) for k=0,1, expected value (mean) = =.

7 001*10,000 = 10! P(X=0)=(10)0e"(10)0!=.0000454P(X=1)=(10) 1e"(10)1!=.000454P(X=2)=(10)2e"(10)2!=.0 0227 Normal Distribution Most important probability distribution Many rv s are approximately normallydistributed Even when they aren t, their sums andaverages often are (CLT)Normal Distribution! f(x; ,"2)=12#"e$(x$ )2/2"2 pdf of normal distribution: standard normal distribution ( = 0, 2 = 1):! f(z;0,1)=12"#e$z2/2 cdf of Z:! P(Z"z)=f(y;0,1)#$z%dyStandardizing Normal RV If X has a normal distribution with mean andstandard deviation , we can standardize to a standardnormal rv:! Z=X" #I Digress: Sampling Distributions Before data is collected, we regard observations as randomvariables (X1,X2,..,Xn) This implies that until data is collected, any function (statistic)of the observations (mean, sd, etc.)

8 Is also a Random variable Thus, any statistic, because it is a Random variable , has aprobability distribution - referred to as a samplingdistribution Let s focus on the sampling distribution of the mean,! X Behold The Power of the CLT Let X1,X2,..,Xn be an iid Random sample from a distribution with mean andstandard deviation . If n is sufficiently large:! X ~N( ,! "n)Example If the mean and standard deviation of serum iron values fromhealthy men are 120 and 15 mgs per 100ml, respectively, what isthe probability that a Random sample of 50 normal men will yield amean between 115 and 125 mgs per 100ml?! p(115"x "125=p115# "x "125# $ % & ' ( ) ! =p" #z# ()! =pz" ()#pz"# ()= # , calculate mean and sd to normalize (120 and)! 15/50R Understand how to calculate probabilities from probabilitydistributions Normal: dnorm and pnorm Poisson: dpois and ppois Binomial: dbinom and pbinom Hypergeometric: dhyper and phyper Exploring relationships among Distributions )


Related search queries