Example: dental hygienist

Introductory Statistics Notes - Stat-Help.com

Introductory Statistics NotesJamie DeCosterDepartment of PsychologyUniversity of Alabama348 Gordon Palmer HallBox 870348 Tuscaloosa, AL 35487-0348 Phone: (205) 348-4431 Fax: (205) 348-8648 August 1, 1998 These were compiled from Jamie DeCoster s Introductory Statistics class at Purdue University. Textbook ref-erences refer to Moore sThe Active Practice of Statistics . CD-ROM references refer to Velleman you wish to cite the contents of this document, the APA reference for them would beDeCoster, J. (1998). Introductory Statistics <month, day, and year you downloadedthis file>from help with data analysis RIGHTS TO THIS DOCUMENT ARE Understanding Data21 Introduction32 Data and Measurement43 The Distribution of One Variable54 Measuring Center and Spread75 Normal Distributions10II Understanding Relationships126 Comparing Groups137 Scatterplots148 Correlation159 Least-Squares Regression169 Association vs.

Introductory Statistics Notes Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348

Tags:

  Notes, Statistics, Introductory, Introductory statistics notes

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Introductory Statistics Notes - Stat-Help.com

1 Introductory Statistics NotesJamie DeCosterDepartment of PsychologyUniversity of Alabama348 Gordon Palmer HallBox 870348 Tuscaloosa, AL 35487-0348 Phone: (205) 348-4431 Fax: (205) 348-8648 August 1, 1998 These were compiled from Jamie DeCoster s Introductory Statistics class at Purdue University. Textbook ref-erences refer to Moore sThe Active Practice of Statistics . CD-ROM references refer to Velleman you wish to cite the contents of this document, the APA reference for them would beDeCoster, J. (1998). Introductory Statistics <month, day, and year you downloadedthis file>from help with data analysis RIGHTS TO THIS DOCUMENT ARE Understanding Data21 Introduction32 Data and Measurement43 The Distribution of One Variable54 Measuring Center and Spread75 Normal Distributions10II Understanding Relationships126 Comparing Groups137 Scatterplots148 Correlation159 Least-Squares Regression169 Association vs.

2 Causation18 III Generating Data1910 Sample Surveys2011 Designed Experiments21IV Experience with Random Behavior2312 Randomness2413 Intuitive Probability2514 Conditional Probability2615 Random Variables2716 Sampling Distributions28iV Statistical Inference2917 Estimating With Confidence3018 Confidence Intervals for a Mean3119 Testing Hypotheses3220 Tests for a Mean34VI Topics in Inference3621 Comparing Two Means3722 Inference for Proportions3923 Two-Way Tables4124 Inference for Regression4225 One-Way Analysis of Variance451 Part IUnderstanding Data2 Chapter 1 Introduction It is important to know how to understand Statistics so that we can make the proper judgments whena person or a company presents us with an argument backed by data. Dataare numbers with a context. To properly perform Statistics we must always keep the meaning ofour data in mind. You will spend several hours every day working on this course.

3 You are responsible for material coveredin lecture, as well as the contents of the textbook and the CD-ROM. You will have homework, CD-ROM, and reading assignments every day. It is important not to get behind in this course. A goodwork schedule would be: Review the Notes from the previous day s lecture, and take care of any unfinished assignments. Attend the lecture. Attend the lab section. Do your homework. You will want to plan on staying on campus for this, as your homework willoften require using the CD-ROM. Do the CD-ROM assignments. Do the Reading probably seems like a lot of work, and it is. This is because we need to cover 15 weeks of materialin 4 weeks during Maymester. Completing the course will not be easy, but I will try to make it as goodan experience as I 2 Data and Measurement Statistics is primarily concerned with how to summarize and interpretvariables. A variable is anycharacteristic of an object that can be represented as a number.

4 The values that the variable takeswill vary when measurements are made on different objects or at different times. Each time that we record information about an object we observe acase. We might include severaldifferent variables in the same case. For example, we might measure the height, weight, and hair colorof a group of people in an experiment. We would have one case for each person, and that case wouldcontain that person s height, weight, and hair color values. All of our cases put together is called ourdata set. Variables can be broken down into two types: Quantitative variablesare those for which the value has numerical meaning. The value refersto a specific amount of some quantity. You can do mathematical operations on the values ofquantitative variables (like taking an average). A good example would be a person s height. Categorical variablesare those for which the value indicates different groupings.

5 Objects thathave the same value on the variable are the same with regard to some characteristic, but youcan t say that one group has more or less of some feature. It doesn t really make sense to domath on categorical variables. A good example would be a person s gender. Whenever you are doing Statistics it is very important to make sure that you have a practical under-standing of the variables you are using. You should make sure that the information you have trulyaddresses the question that you want to , for each variable you want to think about who is being measured, what about them isbeing measured, and why the researcher is conducting the experiment. If the variable is quantitativeyou should additionally make sure that you know what units are being used in the 3 The Distribution of One Variable The pattern of variation of a variable is called itsdistribution. If you examined a large number ofdifferent objects and graphed how often you observed each of the different values of the variable youwould get a picture of the variable s distribution.

6 Bar chartsare used to display distributions of categorical variables. In a bar chart different groupsare represented on the horizontal axis. Over each group a bar is drawn such that the height of the barrepresents the number of cases falling in that group. A good way to describe the distribution of a quantitative variable is to take the following three the center of the the general shape of the any significant deviations from the general shape. A distribution can have many different shapes. One important distinction is betweensymmetricandskeweddistributions. A distribution is symmetric if the parts above and below its center are mirrorimages. A distribution is skewed to the right if the right side is longer, while it is skewed to the left ifthe left side is longer. Local peaks in a distribution are calledmodes. If your distribution has more than one mode it oftenindicates that your overall distribution is actually a combination of several smaller ones.

7 Sometimes a distribution has a small number of points that don t seem to fit its general shape. Thesepoints are calledoutliers. It s important to try to explain outliers. Sometimes they are caused by dataentry errors or equipment failures, but other times they come from situations that are different in someimportant way. Whenever you collect a set of data it is useful to plot its distribution. There are several ways of doingthis. For relatively small data sets you can construct astemplot. To make a stemplot each case into a stem and a leaf. The stem will contain the first digits and the leafwill contain a single digit. You ignore any digits after the one you pick for your leaf. Exactlywhere you draw the break will depend on your distribution: Generally you want at least fivestems but not more than the stems in increasing order from top to bottom. Draw a vertical line to the right ofthe the leaves belonging to each stem to the right of the line, arranged in ascending numer-ical To compare two distributions you can constructback-to-back stemplots.

8 The basic procedure isthe same as for stemplots, except that you place lines on the left and the right side of the then list out the leaves from one distribution on the right, and the leaves from the otherdistribution on the left. For larger distributions you can build ahistogram. To make a histogram the range of your data into classes of equal width. Sometimes there are naturaldivisions, but other times you need to make them yourself. Just like in a stemplot, yougenerally want at least five but less than twenty the number of cases in each class. These counts are called a plot with the classes on the horizontal axis and the frequences on the vertical axis. Time plotsare used to see how a variable changes over time. You will often observe cycles, where thevariable regularly rises and falls within a specific time 4 Measuring Center and Spread In this section we will discuss some ways of generating mathematical summaries of distributions.

9 Inthese equations we will make use of some stastical notation. We will always usento refer to the total number of cases in our data set. When referring to the distribution of a variable we will use a single letter (other thann). Forexample we might usehto refer to height. If we want to talk about the value of the variable in aspecific case we will put a subscript after the letter. For example, if we wanted to talk about theheight of person 5 in our data set we would call ith5. Several of our formulas involvesummations, represented by the symbol. This is a shorthandfor the sum of a set of values. The basic form of a summation equation would bex=b i=af(i).To determinexyou calculate the functionf(i) for each value fromi=atoi=band add themtogether. For example,4 i=2i2= 22+ 32+ 42= 29. Some important properties of summation:n i=1k=nk,wherekis a constant( )n i=1(Yi+Zi) =n i=1Yi+n i=1Zi( )n i=1(Yi+k) =n i=1Yi+nk,wherekis a constant( )n i=1(kYi) =kn i=1Yi,wherekis a constant( ) When people want to give a simple description of a distribution they typically report measures of itscentral tendency and spread.

10 People most often report themeanand thestandard deviation. The main reason why they are used isbecause of their relationship to thenormal distribution, an important distribution that will be discussedin the next The mean represents the average value of the variable, and can be calculated as x=1nn i=1xi.( ) In Statistics our sums are almost always over all of the cases, so we will typically not bother towrite out the limits of the summation and just assume that it is performed over all the cases. Sowe might write the formula for the mean as1n xi. The standard deviation is a measure of the average difference of each case from the mean, andcan be calculated ass= (xi x)2n 1.( ) The termxi xis the deviation of an case from the mean. We square it so that all the deviationscores will come out positive (otherwise the sum would always be zero). We then take the squareroot to put it back on the proper scale.


Related search queries