Transcription of Chapter 2
1 Chapter 2 Describing, Exploring, and Comparing DataImportant Characteristics of Data Describes the overall pattern of a distribution : Center Divides the data in half Spread Differences between the data Shape Skewnessof the data Outlier Data that falls outside of the pattern Data Distributions Graphs displays distribution Numbers describe the distributionLesson 2-2 frequency DistributionFrequency DistributionGradesFrequencyA (100 90)5B ( 89 80)8C ( 79 70)4D ( 69 60)5F (59 50)3A frequency distribution lists the number of occurrences for each category of Class LimitsUpper Class LimitsExample Page 44, #2 Systolic BloodPressure of WomenFrequency80 999100 11924120 1395140 1591160 1790180 1991 Identify the class width, class midpoints, and classboundaries for the given frequency Page 44, #2 Blood PressureFrequency80 999100 11924120 1395140 1591160 1790180 1991 Find the class 80 20 Example Page 44, #2 Blood PressureClass MidpointsClass Boundaries80 99100 119120 139140 159160 179180 19980 Relative frequency distribution The relative frequency is the proportion or percent of observations within a category and is found using the formula Reasons for Constructing frequency Distributions Large data sets can be summarized.
2 Can gain some insight into the nature of data. Have a basis for constructing frequency = Class frequency Sum of all Frequencies Example Page 44, #6 Blood PressureFrequencyRelative Frequency80 999100 11924120 1395140 1591160 1790180 1991 Total40 Construct the relative frequency distribution # frequency distribution Discrete Data It displays the total number of observation less than or equal to the category. Continuous Data It displays the total number of observation less than equal to the upper class limit of a Page 44, #10 FrequencyRelative FrequencyCumulative Frequency9245101 Construct the cumulative frequency distribution # 24 33 33 5 38 393940 Example Page 45, #16 In Tobacco and Alcohol Use in G-Rated Children s Animated Films, by Goldstein, Sobel, and Newman (Journalof American Medical Association,Vol 281, No. 12), the length(in seconds) of scenes showing tobacco use and alcohol usewere recorded for animated children s movies. Refer toData set 7 in Appendix B.
3 Construct a separate frequencydistribution for the lengths of time for tobacco use and alcoholuse. In both cases, uses the classes of 0 99, 100 199, and so on. Compare the results and determine whetherthere appears to be a significant Page 45, #16 STAT2ndSTATE xample Page 45, #16 Time (Sec)TobaccoAlcohol0 993946100 19963200 29940300 39900400 49901500 59910 Example Page 45, #16 There does not appear to be significant differenceLesson 2-3 Visualizing DataDisplaying Distributions Categorical Data (Qualitative) Bar Graphs Pie Charts Measurement Data (Quantitative) Histograms Dotplots Stem-and-leaf plots Ogive frequency PolygonPie Chart When to use: The categorical data has a small number of possible categories. Are most useful for illustrating proportions of the whole data set for various categories. What to look for: Categories that form large or small proportions of the data set. Don t forget to title the graph, label the categories and include all categories that make up the Pie ChartEducation of People 25 to 34 Years Old, 2000 Number of Persons(thousands)RelativeFrequencyLess than High School4, School Graduate11, College10, s Degree8, Degree2, ,786100%Example Pie ChartsEducation of People 25 to 34 Years Old, GradNot HS GradSome CollegeBachelor's DegreeAdvanced 360 110 Bar Graph When to use: The categorical data has a large number of possible categories.
4 What to look for: Frequently or infrequently occurirng categories. Don t forget to include labels for the axes as well as a title for the Bar GraphsEdcucation of People 25 to 34 Years Old, 200005101520253035 Not HS GradHS GradSome CollegeBachelor'sDegreeAdvancedDegreeEdu cationPercentDot Plot When to use: Numerical data sets with small number of observations. What to look for: Conveys information about a typical value in the data set. Extent in which the data values are spread out. The nature of the distribution of values along the number line. The presence of unusual values in the data set. Don t forget to title the graph and label the Dotplot54 59 35 41 46 25 47 60 54 46 49 46 41 34 22 Here are the numbers of home runs that Babe Ruth hit in his15 years with the New York Yankees, 1920 to 1935206055504540353025 Stem Plot When to use: Numerical data sets with a small to moderate number of observations What to look for: Conveys information about a typical value in the data set.
5 Extent in which the data values are spread out. The presence of any gaps in the data. The symmetry in the distribution of values The number and location of peaks. The presence of unusual (outlier) values in the data set. Don t forget to title the graph Example Stem Plot (Babe Ruth)54 59 35 41 46 25 47 60 54 46 49 46 41 34 22234562, 54, 51, 1, 6, 6, 6, 7, 94, 4, 90 Displaying Distributions Categorical Data Bar Graphs Pie Charts Quantitative Data Dotplots Stem-and-leaf plots Histograms Ogive frequency PolygonHistogram When to use: Continuous numerical data sets with a moderate to large number of observations What to look for: Conveys information about a typical value in the data set. Extent in which the data values are spread out. The general shape, location and number of peaks The presence of gaps. The presence of unusual (outlier) values in the data set. Don t forget to title the graph and label Histogram (Discrete Data)The manager of Wendy s fast-food restaurant is interested instudying the typical number of customers who arrive duringthe lunch hour.
6 The data in the following table represent the number of customers who arrive at Wendy s for 40 randomlyselected 15-minute intervals of time during lunch7 6 6 6 4 5 6 6 11 42 7 1 2 4 6 5 5372 2 9 7 5 6 2 6574 6 9 8 5 6 8 265 Number of Arrivals at Wendy sExample Histogram (Discrete Data)7 6 6 6 4 5 6 6 11 42 7 1 2 4 6 5 5372 2 9 7 5 6 2 6574 6 9 8 5 6 8 265 Number of Arrivals at Wendy sStep 1 Construct a frequency distribution tableHow many categories are there?11 Example Histogram (Discrete Data)Number of CustomersTallyFrequencyRelative Histogram (Discrete Data)021210864 Arrivals at Wendy sFrequencyNumber of Customers1111096785432 Example Histogram (Discrete Data) at Wendy sRelative FrequencyNumber of Customers1111096785432 Example Histogram (Continuous Data) you are considering investing in a Roth collect the data table, which represent the three-yearrate of return (in percent) for 40 small capitalization growthmutual Histogram (Continuous Data)STATE xample Histogram (Continuous Data)A)Construct a frequency distribution to display these data.
7 Record your class intervals and countsStep 1 Find the class intervalsLocate the smallest number ( ) and the largestnumber ( )Lower class limit will be with a class width of 5 Example Histogram (Continuous Data)3-yr Rate of Histogram (Continuous Data)3-yr Rate of HistogramStep 2 Graph it using the TIExample HistogramExample -Histogram1015202530354045504812 Rate of ReturnFrequency3 Year Rate of Return of Mutual Funds25%40 Example HistogramB) Describe the distribution of 3 Year Rate of distribution is skewed tothe right with a peak at theclass So = (11/40) of the small-capgrowth fund had a 3-yearreturn between 15% and is one outlier in classthe Too few categories1823280102030405060 Age (in years) frequency (Count)Age of Spring 1998 Stat 250 Studentsn=92 studentsHistogram Too many categories23401234567 GPAF requency (Count)GPAs of Spring 1998 Stat 250 Studentsn=92 studentsOgiveA relative cumulative frequency graph (ogive) is used to find the relative standing of an individual observation.
8 Example Relative Cumulative you are considering investing in a Roth collect the data table, which represent the three-yearrate of return (in percent) for 40 small capitalization growthmutual Relative Cumulative FrequencyClassFreqRelativeFrequencyCumul ative FrequencyRelative cumulative 18286 Relative Cumulative FrequencyClassFreqRel FreqCum FreqRel Cum 26 of the 40 mutual funds had a 3 year rate of return of or less 65% of the mutual funds had 3 year rate of return of or less A mutual fund with a 3 year rate of return of 45% or higher is out performing 95% of its Relative Cumulative FrequencyL3 Upper Class LimitsL4 Relative Cumulative FrequencyExample Relative Cumulative FrequencyExample Relative Cumulative Frequency3 Year Rate of Return for Small Capitalization Mutal of ReturnCumulative Relative Frequency80% of the mutual funds had a 3 year-year rate of returnless than or equal to frequency you are considering investing in a Roth collect the data table, which represent the three-yearrate of return (in percent)
9 For 40 small capitalization growthmutual frequency PolygonClassFreqClass Example frequency PolygonL3 Class MidpointsL4 frequency 3 Year Rate of of ReturnFrequencyExample frequency PolygonLesson 2-4 Measure of CenterMeasuring the Center Mean Median Mode MidrangeMean or Arithmetic MeanFind the sum of all values and then divide by the number of valuesSamplePopulation xxn xN Median Arrange the data in order. Odd number values the median is the value in the exact middle. Even number values add the two middle numbers then divide by Value that occurs most frequently. Bimodalis when two values occur with the same greatest frequency . Multimodal is when more than two values occur with the same greatest frequency . When no value is repeated, we say there is no the value halfway between the highest and lowest = Highest Value + Lowest Value2 Example, Page 70, #10 Find the mean, median, mode and midrange for each of thetwo samples, then compare the two sets of results.
10 : , Page 70, #10 Example, Page 70, #10 Regular Diet Example, Page 70, # lblbExample, Page 70, #10 Diet appears to weigh less because it has less sugar thanregular from the a frequency Distributionuse class midpoints of classes for variable x fxxf frequencyclass midpointnExample, Page 71, #20 The accompany frequency distribution summarizes a sampleof human body temperatures. How does the mean compareto the value of F, which is the value assumed to be themean by most peopleExample, Page 71, #20 TemperatureFrequencyMidpoint fxExample, Page 71, #20 fxxfThe mean appears to be substantially lower than FSkewed To The Left (Negatively)SymmetricSkewed To The Right (Positively)How to Choose the Best Average Choose mode if there are two or more trends in the data Two or more areas of high frequency values Report one mode for each trend Choose the median if the distribution is skewed A small number of outliers are heavily influencing the mean.