Example: bankruptcy

Analysis of Categorical Data - SAGE Publications Inc

5 Analysis of Categorical DataPeople like to clump things into categories. Virtually every researchproject categorizes some of its observations into neat, little distinct bins:male or female; marital status; broken or not broken; small, medium, orlarge; race of patient; with or without a tonsillectomy; and so on. When wecollect data by categories, we record counts how many observations fallinto a particular bin. Categorical variables are usually classified as being oftwo basic types: nominal and ordinal. Nominal variables involve categoriesthat have no particular order such as hair color, race, or clinic site, while thecategories associated with an ordinal variable have some inherent ordering(categories of socioeconomic status, etc.)

Analysis of Categorical Data—— 117 05-Elliott-4987.qxd 7/18/2006 5:26 PM Page 117. 93 arsonists, and 50 of these said they were drinkers. The row percentage in this case tells us that 50 is 53.8% of 93. From the bottom of the table, it can

Tags:

  Data, Sage, Publication, Categorical, Sage publications inc, Categorical data

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Analysis of Categorical Data - SAGE Publications Inc

1 5 Analysis of Categorical DataPeople like to clump things into categories. Virtually every researchproject categorizes some of its observations into neat, little distinct bins:male or female; marital status; broken or not broken; small, medium, orlarge; race of patient; with or without a tonsillectomy; and so on. When wecollect data by categories, we record counts how many observations fallinto a particular bin. Categorical variables are usually classified as being oftwo basic types: nominal and ordinal. Nominal variables involve categoriesthat have no particular order such as hair color, race, or clinic site, while thecategories associated with an ordinal variable have some inherent ordering(categories of socioeconomic status, etc.)

2 Unless otherwise stated, the pro-cedures discussed here can be used on any type of Categorical data . There aresome specific procedures for ordinal data , and they will be briefly discussedlater in the have devised a number of ways to analyze and explaincategorical data . This chapter presents explanations of each of the followingmethods: A contingency table Analysis is used to examine the relationship between twocategorical variables. McNemar s test is designed for the Analysis of paired dichotomous, categori-cal variables to detect disagreement or change. The Mantel-Haenszel test is used to determine whether there is a relationshipbetween two dichotomous variables controlling for or within levels of a 7/18/2006 5:26 PM Page 113 Interrater reliability (kappa) tests whether two raters looking at the sameoccurrence (or condition) give consistent ratings.

3 A goodness-of-fit test measures whether an observed group of counts matchesa theoretical pattern. A number of other Categorical data measures are also briefly get the most out of this chapter, you should first verify that yourvariables are Categorical and then try to match the hypotheses you are test-ing with the ones described in this chapter. If it is not clear that the hypothe-ses you are testing match any of these given here, we recommend that youconsult a Table Analysis (r c)Contingency table Analysis is a common method of analyzing the associa-tion between two Categorical variables. Consider a Categorical variable thathas r possible response categories and another Categorical variable withc possible categories.

4 In this case, there are r c possible combinations ofresponses for these two variables. The r c crosstabulation or contingencytable has rrows and ccolumns consisting of r c cells containing theobserved counts (frequencies) for each of the r c combinations. This typeof Analysis is called a contingency table Analysis and is usually accomplishedusing a chi-square statistic that compares the observed counts with thosethat would be expected if there were no association between the Applications of Contingency Table AnalysisThe following are examples of situations in which a chi-square contin-gency table Analysis would be appropriate.

5 A study compares types of crime and whether the criminal is a drinker orabstainer. An Analysis is undertaken to determine whether there is a gender preferencebetween candidates running for state governor. Reviewers want to know whether worker dropout rates are different for par-ticipants in two different job-training programs. A marketing research company wants to know whether there is a differencein response rates among small, medium, and large companies that were senta Statistical Analysis Quick Reference 7/18/2006 5:26 PM Page 114 Design Considerations for a Contingency Table AnalysisTwo Sampling StrategiesTwo separate sampling strategies lead to the chi-square contingency tableanalysis discussed of single random sample of observations is selectedfrom the population of interest, and the data are categorized on the basis ofthe two variables of interest.

6 For example, in the marketing research exampleabove, this sampling strategy would indicate that a single random sampleof companies is selected, and each selected company is categorized by size(small, medium, or large) and whether that company returned the for random samples are taken from each oftwo or more populations to determine whether the responses related to asingle Categorical variable are consistent across populations. In the marketingresearch example above, this sampling strategy would consider there to bethree populations of companies (based on size), and you would select a sam-ple from each of these populations.

7 You then test to determine whether theresponse rates differ among the three company two-way table is set up the same way regardless of the samplingstrategy, and the chi-square test is conducted in exactly the same way. Theonly real difference in the Analysis is in the statement of the hypotheses Cell Size ConsiderationsThe validity of the chi-square test depends on both the sample size andthe number of cells. Several rules of thumb have been suggested to indicatewhether the chi-square approximation is satisfactory. One such rule sug-gested by Cochran (1954) says that the approximation is adequate if noexpected cell frequencies are less than one and no more than 20% are lessthan CategoriesBecause of the expected cell frequency criterion in the second samplingstrategy, it may be necessary to combine similar categories to lessen thenumber of categories in your table or to examine the data by the section that follows later in this chapter on Mantel-Haenszel compar-isons for information on one way to examine information within of Categorical data 7/18/2006 5.

8 26 PM Page 115 Hypotheses for a Contingency Table AnalysisThe statement of the hypotheses depends on whether you used a test ofindependence or a test for of IndependenceIn this case, you have two variables and are interested in testing whetherthere is an association between the two variables. Specifically, the hypothe-ses to be tested are the following:H0: There is no association between the two : The two variables are for HomogeneityIn this setting, you have a Categorical variable collected separately fromtwo or more populations. The hypotheses are as follows:H0: The distribution of the Categorical variable is the same across the : The distribution of the Categorical variable differs across the and Caveats for a Contingency Table AnalysisUse Counts Do Not Use PercentagesIt may be tempting to use percentages in the table and calculate the chi-square test from these percentages instead of the raw observed is incorrect don t do it!

9 No One-Sided TestsNotice that the alternative hypotheses above do not assume any direc-tion. Thus, there are no one- and two-sided versions of these tests. Chi-square tests are inherently nondirectional ( sort of two-sided ) in the sensethat the chi-square test is simply testing whether the observed frequencies andexpected frequencies agree without regard to whether particular observedfrequencies are above or below the corresponding expected Subject Is Counted Only OnceIf you have ntotal observations ( , the total of the counts is n), thenthese n observations should be independent.

10 For example, suppose you havea Categorical variable Travelin which subjects are asked by what means they116 Statistical Analysis Quick Reference 7/18/2006 5:26 PM Page 116commute to work. It would not be correct to allow a subject to checkmultiple responses ( , car and commuter train) and then include all ofthese responses for this subject in the table ( , count the subject more thanonce). On such a variable, it is usually better to allow only one response pervariable. If you want to allow for multiple responses such as this, then as youare tallying your results, you would need to come up with a new category, car and commuter train.


Related search queries