Chapter 8 Describing Data: Measures of Central …

100 Part 2 / Basic Tools of Research: Sampling, Measurement, Distributions, and Descriptive Statistics Chapter 8. Describing Data: Measures of Central tendency and Dispersion I. n the previous Chapter we discussed measurement and the various levels at which we can use measurement to describe the extent to which an individual observation possesses a particular theoretical construct. Such a description is referred to as a datum. An example of a datum could be how many conversations a person initiates in a given day, or how many minutes per day a person spends watching television, or how many column inches of coverage are devoted to labor issues in The Wall Street Journal. Multiple observations of a particular characteristic in a population or in a sample are referred to as data. After we collect a set of data, we are usually interested in making some statistical summary statements about this large and complex set of individual values for a variable.

That is, we want to describe a collective such as a sample or a population in its entirety. This description is the first step in bridging the gap between the measurement world of our limited number of observations, and the real world complexity. We refer to this process as Describing the distribution of a variable. There are a number of basic ways to describe collections of data. Chapter 8: Describing Data: Measures of Central tendency and Dispersion 101 Part 2 / Basic Tools of Research: Sampling, Measurement, Distributions, and Descriptive Statistics Describing Distributions Description by Enumeration One way we can describe the distribution of a variable is by enumeration, that is, by simply listing all the values of the variable. But if the data set or distribution contains more than just a few cases, the list is going to be too complex to be understood or to be communicated effectively. Imag- ine trying to describe the distribution of a sample of 300 observations by listing all 300 measure - ments.

Description by Visual Presentation Another alternative that is frequently used is to present the data in some visual manner, such as with a bar chart, a histogram, a frequency polygon, or a pie chart. Figures 8-1 through 8-5 give examples of each of these, and the examples suggest some limitations that apply to the use of these graphic devices. The first limitation that can be seen in Figure 8-1 is that the data for bar charts should consist of a relatively small number of response categories in order to make the visual presentation useful. That is, the variable should consist of only a small number of classes or categories. The variable CD. Player Ownership is a good example of such a variable. Its two classes ( Owns a CD Player and Does not own a CD Player ) lend themselves readily to presentation via a bar chart. Figure 8-2 gives an example of the presentation of data in a histogram. In a histogram the horizontal axis shows the values of the variable (in this case the number of CD discs a person reports having purchased in the previous year) and the vertical axis shows the frequencies associated with Chapter 8: Describing Data: Measures of Central tendency and Dispersion 102 Part 2 / Basic Tools of Research: Sampling, Measurement, Distributions, and Descriptive Statistics Chapter 8: Describing Data: Measures of Central tendency and Dispersion 103 Part 2 / Basic Tools of Research: Sampling, Measurement, Distributions, and Descriptive Statistics these values, that is, how many persons stated that they purchased, for instance, 8 CDs.

In histograms or bar charts, the shape of the distribution can convey a significant amount of information. This is another reason why it is desirable to conduct measurement at an ordinal or interval level, as this allows you to organize the values of a variable in some meaningful sequence. Notice that the values on the horizontal axis of the histogram are ordered from lowest to highest, in a natural sequence of increasing levels of the theoretical concept ( Compact Disc Purchasing ). If the variable to be graphed is nominal, then the various classes could be arranged visually in any one of a large number of sequences. Each of these sequences would be equally natural , since nominal categories contain no ranking or ordering information, and each sequence would convey different and conflicting information about the distribution of the variable. The shape of the distribution would convey no useful information at all. Bar charts and histograms can be used to compare the relative sizes of nominal categories, but they are more useful when the data graphed are at the ordinal or higher level of measurement.

Figure 8-3 gives an alternative to presenting data in a histogram. This method is called a frequency polygon, and it is constructed by connecting the points which have heights corresponding with the frequencies on the vertical axis. Another way of thinking of a frequency polygon is as a line which connects the midpoints of the tops of the bars in the histogram. Notice that the number of response categories that can be represented in the histogram or frequency polygon is limited. It would be very difficult to accommodate a variable with many more classes. If we want to describe a variable with a large number of classes using a histogram or a frequency polygon, we would have to collapse categories, that is, combine a number of previously distinct classes, such as the classes 0, 1, 2, etc. into a new aggregate category, such as 0 through 4, 5. through 9, 10 through 14, etc. Although this process would reduce the number of categories and increase the ease of presentation in graphical form, it also results in a loss of information.

For instance, a person who purchased 0 CDs would be lumped together with a person who purchased as many as 4 CDs in the 0-4 class, thereby losing an important distinction between these two individu- Chapter 8: Describing Data: Measures of Central tendency and Dispersion 104 Part 2 / Basic Tools of Research: Sampling, Measurement, Distributions, and Descriptive Statistics als. Figure 8-4 illustrates the results of such a reclassification or recoding of the original data from Figure 8-3. Yet another way of presenting data visually is in the form of a pie chart. Figure shows a pie chart which presents the average weekly television network ratings during prime time. Pie charts are appropriate for presenting the distributions of nominal variables, since the order in which the values of the variable are introduced is immaterial. The four classes of the variable as presented in this chart are: tuned to ABC, tuned to NBC, tuned to CBS and, finally, tuned to anything else or not turned on.

There is no one way in which these levels of the variable can or should be ordered. The sequence in which these shares are listed really does not matter. All we need to consider is the size of the slice associated with each class of the variable. Descriptive Statistics Another way of Describing a distribution of data is by reducing the data to some essential indicator that, in a single value, expresses information about the aggregate of all observations. De- scriptive statistics do exactly that. They represent or epitomize some facet of a distribution. Note that the name descriptive statistics is actually a misnomer we do not limit ourselves to sample distributions. These descriptive statistics allow us to go beyond the mere description of a distribution. They can also be used for statistical inference, which permits generalizing from the limited number of observations in a sample to the whole population. We explained in Chapter 5 that this is a major goal of scientific endeavors.

This fact alone makes descriptive statistics preferable to either enumeration or visual presentation. However, descriptive statistics are often used in conjunction with visual presentations. Descriptive statistics can be divided into two major categories: Measures of Central tendency ;. and Measures of Dispersion or Variability. Both kinds of Measures focus on different essential char- acteristics of distributions. A very complete description of a distribution can be obtained from a relatively small set of Central tendency and dispersion Measures from the two categories. Measures of Central tendency The Measures of Central tendency describe a distribution in terms of its most frequent , typi- cal or average data value. But there are different ways of representing or expressing the idea of typicality . The descriptive statistics most often used for this purpose are the Mean (the average), the Mode (the most frequently occurring score), and the Median (the middle score).

Chapter 8: Describing Data: Measures of Central tendency and Dispersion 105 Part 2 / Basic Tools of Research: Sampling, Measurement, Distributions, and Descriptive Statistics The Mean The mean is defined as the arithmetic average of a set of numerical scores, that is, the sum of all the numbers divided by the number of observations contributing to that sum. MEAN = = sum of all data values / number of data values or, more formally, which is the formula to be used when the data are in an array, which is simply a listing of a set of observations, organized by observation number. An example of data in an array can be found in Table 8-1(a). Before we proceed, a few words about notation. The subscript i (in Xi) in the above formula represents the fact that there will be a number of values of the variable X, one for each observation: X1 is the value of X for observation one (the first subject or respondent), X2 is the value of X for the second observation, etc.

, all the way to XN, which is the value of X for the last observation (there are N observations in the sample). If we observe 25 college students and ask them how many compact discs they have bought over the last year, the response of the first person will be X1; the second person's response will be X2; and the last, or Nth response, will be X25. The symbol E in the formula instructs us to sum all these values of X, beginning with X1 (i = 1) and continuing to do this until the last observation (N, or 25 in the example) has been included. We will encounter this notation regu- larly in the chapters to come. The reason why the sum of all the observations' values is divided by N (the number of observations) is probably intuitively clear. Computing the Mean is something you have likely done innu- merable times. It may, however, be informative to explain why the simple sum of a set of observations does not have much utility for Describing distributions.

Chapter 8 Describing Data: Measures of Central …

Tags:

Information

Transcription of Chapter 8 Describing Data: Measures of Central …

Related search queries

Chapter 8 Describing Data: Measures of Central …

Tags:

Information

Documents from same domain

Related documents

Related search queries