Example: marketing

Working with categorical data with R and the …

Working with categorical data withRand thevcdandvcdExtrapackagesMichael FriendlyYork University, TorontoUsingvcdExtraversion andvcdversion ; Date: 2016-02-24 AbstractThis tutorial describes the creation and manipulation of frequency and contingency tablesfrom categorical variables, along with tests of independence, measures of association, and meth-ods for graphically displaying results. The framework is provided by theRpackagevcd, butother packages are used to help with various tasks. ThevcdExtrapackage extends the graphicaland statistical methods provided package is now the main support package for the bookDiscrete data Analysis withR: Visualizing and Modeling Techniques for categorical and Count data (Friendly and Meyer2016).

Working with categorical data with R and the vcd and vcdExtra packages Michael Friendly York University, Toronto Using vcdExtra version 0.7-0 …

Tags:

  With, Data, Working, Categorical, Working with categorical data with

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Working with categorical data with R and the …

1 Working with categorical data withRand thevcdandvcdExtrapackagesMichael FriendlyYork University, TorontoUsingvcdExtraversion andvcdversion ; Date: 2016-02-24 AbstractThis tutorial describes the creation and manipulation of frequency and contingency tablesfrom categorical variables, along with tests of independence, measures of association, and meth-ods for graphically displaying results. The framework is provided by theRpackagevcd, butother packages are used to help with various tasks. ThevcdExtrapackage extends the graphicaland statistical methods provided package is now the main support package for the bookDiscrete data Analysis withR: Visualizing and Modeling Techniques for categorical and Count data (Friendly and Meyer2016).

2 The web page for the book, , gives further Introduction12 Creating frequency Ordered factors .. structable() .. table() and friends .. xtabs() .. Collapsing over factors .. Collapsing levels .. Converting .. A complex example .. 123 Tests of CrossTable .. Chi-square test .. Fisher Exact Test .. Mantel-Haenszel test .. CMH tests: ordinal factors .. Measures of Association .. Measures of Agreement .. Correspondence analysis .. 214 Loglinear Fitting with loglm() .. Fitting with glm() and gnm().

3 Non-linear terms .. 275 Mosaic Mosaics for loglinear models .. Mosaics for glm() and gnm() models . Mosaic tips and techniques .. Changing labels .. 326 Continuous Spine and conditional density plots .. Model-based plots .. 351 IntroductionThis tutorial, part of thevcdExtrapackage, describes how to work with categorical data in thecontext of fitting statistical models inRand visualizing the results using thevcdandvcdExtrapackages. It focuses first on methods and tools for creating and manipulatingRdata objects whichrepresent frequency and contingency tables involving categorical sections describe some simple methods for calculating tests of independence and mea-sures of association amomg categorial variables, and also methods for graphically displaying is much more to the analysis of categorical data than is described here, where the em-phasis is on cross-tabulated tables of frequencies ( contingency tables )

4 , statistical tests, associatedloglinear models, and visualization ofhowvariables are more general treatment of graphical methods for categorical data is contained in the book,Discrete data Analysis with R: Visualizing and Modeling Techniques for categorical and Count data (Friendly and Meyer 2016). An earlier book using SAS isVisualizing categorical data (Friendly2000), for whichvcdis a partialRcompanion, covering topics not otherwise available inR. On theother hand, the implementation of graphical methods invcdis more general in many respects thanwhat I provided inSAS.

5 Statistical models for categorical data inRhave been extended considerablywith thegnmpackage for generalizednonlinearmodels. ThevcdExtrapackage extendsvcdmethodsto models fit usingglm()andgnm().A more complete theoretical description of these statistical methods is provided in Agresti s(2002; 2013) categorical data Analysis. For this, see theSplus/Rcompanion by Laura Thomp-son, ~aa/ Agresti s support web page, ~aa/ Creating and manipulating frequency tablesRprovides many methods for creating frequency and contingency tables. Several are describedbelow.

6 In the examples below, we use some real examples and some anonymous ones, where thevariablesA,B, andCrepresent categorical variables, andXrepresents an arbitraryRdata first thing you need to know is that categorical data can be represented in three differentforms inR, and it is sometimes necessary to convert from one form to another, for carrying outstatistical tests, fitting models or visualizing the results. Once a data object exists inR, you canexamine its complete structure with thestr()function, or view the names of its components withthenames() forma data frame containing individual observations, with one or more factors, used as theclassifying variables.

7 In case form, there may also be numeric covariates. The total numberof observations isnrow(X), and the number of variables isncol(X).Example:TheArthritisdata is available in case form in thevcdpackage. There are twoexplanatory a numeric covariate, andImprovedis theresponse an ordered factor, with levelsNone < Some < Marked. ExcludingAge, we wouldhave a 2 2 3 contingency table forTreatment,SexandImproved.> names(Arthritis) # show the variables[1] "ID" "Treatment" "Sex" "Age" "Improved"> str(Arthritis) # show the structure' ': 84 obs.

8 Of 5 variables:$ ID : int 57 46 77 17 36 23 75 39 33 55 ..$ Treatment: Factor w/ 2 levels "Placebo","Treated": 2 2 2 2 2 2 2 2 2 2 ..$ Sex : Factor w/ 2 levels "Female","Male": 2 2 2 2 2 2 2 2 2 2 ..$ Age : int 27 29 30 32 46 58 59 59 63 63 ..$ Improved : w/ 3 levels "None"<"Some"<..: 2 1 1 3 3 3 1 3 1 1 ..> head(Arthritis,5) # first 5 observations, same as Arthritis[1:5,]2ID Treatment Sex Age Improved1 57 Treated Male 27 Some2 46 Treated Male 29 None3 77 Treated Male 30 None4 17 Treated Male 32 Marked5 36 Treated Male 46 Markedfrequency forma data frame containing one or more factors, and a frequency variable, oftencalledFreqorcount.

9 The total number of observations issum(X$Freq),sum(X[,"Freq"])or some equivalent form. The number of cells in the table isnrow(X).Example:For small frequency tables, it is often convenient to enter them in frequency ()for the factors andc()to list the counts in a vector. The examplebelow, from Agresti (2002) gives results for the 1991 General Social Survey, with respondentsclassified by sex and party identification.> # Agresti (2002), table , p. 106> GSS <- (+ (sex=c("female", "male"),+ party=c("dem", "indep", "rep")),+ count=c(279,165,73,47,225,191))> GSSsex party count1 female dem 2792 male dem 1653 female indep 734 male indep 475 female rep 2256 male rep 191> names(GSS)[1] "sex" "party" "count"> str(GSS)' ': 6 obs.

10 Of 3 variables:$ sex : Factor w/ 2 levels "female","male": 1 2 1 2 1 2$ party: Factor w/ 3 levels "dem","indep",..: 1 1 2 2 3 3$ count: num 279 165 73 47 225 191> sum(GSS$count)[1] 980table forma matrix, array or table object, whose elements are the frequencies in ann-way variable names (factors) and their levels are given bydimnames(X). The total number ofobservations issum(X). The number of dimensions of the table islength(dimnames(X)), andthe table sizes are given bysapply(dimnames(X), length).Example:TheHairEyeColoris stored in table form invcd.> str(HairEyeColor) # show the structure3table [1:4, 1:4, 1:2] 32 53 10 3 11 50 10 30 10 25.


Related search queries