Crash Course on Basic Statistics

Crash Course on Basic StatisticsMarina Wahl, of New York at Stony BrookNovember 6, 20132 Contents1 Basic Basic Definitions .. Probability of Events .. Bayes Theorem ..62 Basic Types of Data .. Errors .. Reliability .. Validity .. Probability Distributions .. Population and Samples .. Bias .. Questions on Samples .. Central Tendency ..93 The Normal Distribution114 The Binomial Distribution135 Confidence Intervals156 Hypothesis Testing177 The t-Test198 Regression239 Logistic Regression2510 Other Topics2711 Some Related Questions2934 CONTENTSC hapter 1 Basic Basic DefinitionsTrials?Probability is concerned with the outcome also calledexperimentsorobserva-tions(multipl e trials).?Trialsrefers to an event whose outcome is Space (S)?Set ofall possible elementary outcomesofa the trial consists of flipping a coin twice, thesample space isS= (h,h),(h,t),(t,h),(t,t).

?The probability of the sample space (E)?Aneventis thespecificationof the outcome ofa consist of asingleoutcome or asetof aneventis everything inthe sample space that is not that event (not Eor E).?The probability of aneventis alwaysbetween0 and probability of aneventand itscomplementis always Events?Theunionof several simple events creates acompound event thatoccurs if one or moreof the events two or more simple eventscreates a compound event that occursonly ifall the simple events events cannot occur together, they aremutu-ally two trials areindependent, the outcome ofone trial does not influence the outcome of all the possible ways ele-ments in a set can be arranged, where theorderis number of permutations of subsets of sizekdrawn from a set of sizenis given by:nPk=n!(n k)!56 CHAPTER 1. Basic PROBABILITYC ombinations?Combinationsare similar to permutations withthe difference that theorder of elements isnot number of combinations of subsets of sizekdrawn from a set of sizenis given by:nPk=n!

K!(n k)! Probability of Events?If two events areindependents,P(E|F) =P(E). The probability of both E and F occur-ring is:P(E F) =P(E) P(F)?If two events aremutually exclusive, the prob-ability of eitherEorF:P(E F) =P(E) +P(F)?If the events arenot mutually exclusive(youneed to correct the overlap ):P(E F) =P(E) +P(F) P(E F),whereP(E F) =P(E) P(F|E) Bayes TheoremBayes theorem for any two events:P(A|B) =P(A B)P(B)=P(B|A)P(A)P(B|A)P(A) +P(B| A)P( A)?Frequentist: There are true, fixed parameters in a model(though they may be unknown at times). Data contain random errors which have acertain probability distribution (Gaussianfor example). Mathematical routines analyse the proba-bility of getting certain data, given a par-ticular : There are no true model parameters. In-stead all parameters are treated as randomvariables with probability distributions. Random errors in data have no probabilitydistribution, but rather the model param-eters are random with their own distribu-tions.

Mathematical routines analyze probabilityof a model, given some data. The statisti-cian makes a guess (prior distribution) andthen updates that guess with the 2 Basic Types of DataThere two types of measurements:?Quantitative:Discretedata have finite have an infinite numberof (nominal): the possible responsesconsist of a set of categories rather than numbersthat measure an amount of something on a con-tinuous Errors?Random error: due to chance, with no partic-ular pattern and it is assumed to cancel itself outover repeated errors: has an observable pattern,and it is not due to chance, so its causes can beoften ReliabilityHow consistent or repeatablemeasurements are:?Multiple-occasions reliability (test-retest,temporal): how similarly a test perform overrepeated reliability (parallel-forms):how similarly different versions of a test performin measuring the same consistency reliability: how wellthe items that make up instrument (a test) re-flect the same ValidityHow well a test or rating scale measureswhatis supposed to measure:?

Content validity: how well the process of mea-surement reflects the important content of thedomain of validity:how well inferencesdrawn from a measurement can be used to pre-dict some other behaviour that is measured atapproximately same validity: the ability to draw infer-ences about some event in the Probability Distributions?Statistical inference relies on making assump-tions about the way data is distributed, trans-forming data to make it fit some known distri-bution probability distributionis de-fined by a formula that specifies what values canbe taken by data points within the distributionand how common each value (or range) will 2. Basic Population and Samples?We rarely have access to the entire population ofusers. Instead we rely on a subset of the popu-lation to use as a proxy for the statisticsestimateunknown popu-lation you should select yoursample ran-domlyfrom the parent population, but in prac-tice this can be verydifficultdue to: issues establishing a truly random selectionscheme, problems getting the selected users to is more important than Sampling?

Subject to sampling bias. Conclusions are of lim-ited usefulness in generalizing to a larger popu-lation: Volunteersamples. Convenience samples: collect informa-tion in the early stages of a study. Quota sampling: the data collector isinstructed to get response from a certainnumber of subjects within Sampling?Every member of the population has a knowprobability to be selected for the simplest type is asimple random sam-pling(SRS).?Systematic sampling: need a list of your pop-ulation and you decide the size of the sample andthen compute the numbern, which dictates howyou will select the sample: Calculatenby dividing the size of the pop-ulation by the number of subjects you wantin the sample. Useful when the populationaccrues overtimeand there isno predetermined listof population members. One caution: making sure data is not sample: the population of interestis divided into non overlapping groups orstratabased on common sample: population is sampled by us-ing pre-existing groups.

It can be combined withthe technique of sampling proportional to Bias?Sample needs to be a good representation of thestudy the sample is biased, it is not representativeof the study population, conclusions draw fromthe study sample might not apply to the statistic used to estimate a parameter isun-biasedif the expected value of its sampling dis-tribution is equal to the value of the parameterbeing is a source of systematic error and enterstudies in two primary ways: During theselection and retentionofthe subjects of study. In the wayinformation is collectedabout the Selection Bias?Selection bias: if some potential subjects aremore likely than others to be selected for thestudy sample. The sample is selected in a waythat systematically excludes part of the CENTRAL TENDENCY9?Volunteer bias: the fact that people who vol-unteer to be in the studies are usually not rep-resentative of the population as a bias: the other side of volunteerbias.

Just as people who volunteer to take partin a study are likely to differ systematically fromthose who do not, so people who decline to par-ticipate in a study when invited to do so verylikely differ from those who consent to censoring: can create bias in anylongitudinal study (a study in which subjects arefollowed over a period of time). Losing subjectsduring a long-term study is common, but thereal problem comes when subjects do not dropout at random, but for reasons related to thestudy s Bias?Interviewer bias: when bias is introduced in-tro the data collected because of the attitudes orbehaviour of the bias: the fact that people with a life ex-perience such as suffering from a serious diseaseor injury are more likely to remember events thatthey believe are related to that bias: the fact that certain charac-teristics may be more likely to be detected orreported in some people than in desirability bias: caused by people sdesire to present themselves in a favorable Questions on SamplesRepresentative Sampling?

How was the sample selected??Was it truly randomly selected??Were there any biases in the selection process?Bias?Response Bias: how were the questions wordedand the response collected??Concious Bias: are arguments presented in a dis-interested, objective fashion??Missing data and refusals: how is missing datatreated in the analysis? How is attrition (loss ofsubjects after a study begins) handled?Sample Size?Were the sample sizes selected large enough fora null hypothesis to be rejected??Were the sample sizes so large that almost anynull hypothesis would be rejected??Was the sample size selected on the basis of apower calculation? Central TendencyMean?Good if data set that is roughly symmetrical: =1nn i= : data error or they belong to value when the values are ranked in as-cending or descending data is not symmetrical, mean can beheavily influenced by outliers, and median pro-vides a better idea o most typical odd samples, the median is the central value(n+ 1)/2th.

For even samples, it s the averageof the two central values, [n/2 + (n+ 1)/2] 2. Basic DEFINITIONS?In asmall sampleof data (less than 25 or so),the sample median tends to do a poor job ofestimating the population datathegeometric meantends to provide a better estimate of the pop-ulation s middle value than the sample most frequently occurring useful in describing ordinal or categorical : simplest measure of dispersion, whichis the difference between the highest and range: less influenced by ex-treme : The most common way to do measure dis-persion for continuous data. Provides an estimate of the average differ-ence of each value from the mean. For a population: 2=1nn i=1(xi )2 For a sample:s2=1n 1n i=1(xi x)2?Standard deviation: For a population: = 2, For a sample:s= 3 The Normal DistributionFigure : (left) All normal distributions have thesame shape but differ to their and : they areshifted by and stretched by.

Crash Course on Basic Statistics

Tags:

Information

Advertisement

Transcription of Crash Course on Basic Statistics

Related search queries

Crash Course on Basic Statistics

Tags:

Information

Advertisement

Documents from same domain

Related documents

Related search queries