1 Plottingrparttrees with MilborrowNovember 23, 2018 Contents1 Introduction22 Quick start23 Main arguments24 Printing rules FAQ96 Customizing the node labels137 Examples using the color and palette arguments188 Branch widths279 Trimming a tree with the mouse2810 Usingplotmoin conjunction withprp2911 Compatibility The graph layout algorithm33An Exampletemp < 68ibh >= 3574dpg < 9ibt < 227temp >= 68ibh < 3574dpg >= 9ibt >= 227n=330 100%n=214 65%n=108 33%n=106 32%n=35 11%n=71 22%n=116 35%n=55 17%n=61 18% IntroductionThe functions in package plotrparttrees [6,7]. The next page shows some examples (Fig-ure1).
2 The workhorse function isprp. It automatically scales and adjusts the displayed tree for best fit. It combinesand extends in 2 and 3 of this document (the Quick Start and the Main Arguments) are the most , which prints a tree as a set of rules. The remaining sections may beskipped or read in any assume you have already looked at the vignette included with therpartpackage :An Introduction to Recursive Partitioning Using the rpart Routinesby Therneau and Quick startThe easiest way to plot a tree is to This function is a simplified front-end to the workhorsefunctionprp, with only the most useful arguments of that function.
3 Its arguments aredefaulted to display atree with colors and details appropriate for the model s response (whereasprpby default displays a minimalunadorned tree).As described in the section below, the overall characteristics of the displayed tree can be changed with thetypeandextraarguments3 Main argumentsThis section is an overview of the important arguments For most users thesearguments should suffice and the many other arguments can be determine the overall Plotting style, as shown in add more details to the node labels, as shown in Figures3and4. Useunder = TRUEto putthose details under the boxes. Withextra = "auto"(the default ), a suitable value forextrawill be chosen automatically (based on the type of response for the model).
4 Figure1illustrates. Thehelp page has ,varlen, andfaclento display more significant digits and more characters in names. In partic-ular, use the special valuesvarlen = 0andfaclen = 0to display full variable and factor character size will be adjusted automatically unlesscexis explicitly set. Usetweakto adjust theautomatically calculated size, often something liketweak = = Usingtweakis ofteneasier than intensity of a node s color is proportional to the value predictedat the node. The color scheme can bechanged with For details see the help page and = "auto"automatically choose a palette (default , Figure1) = 0uncolored (white) boxes(default forprp) = "Grays"a range of grays("Grays"is one of the built in palettes) = "gray"uniform gray boxes2titanic survived(binary response)sex = maleage >= >= per gallon(continuous response)Price >= 9447 Type = Large,Medium,VanType = Large,VanPrice >= 11e+325100%2380%2138%1917%2222%2542%2423 %2518%3220%yesnovehicle reliability(multi class response)Country = Germany,Korea,Mexico,Sweden,USAType = SportyType = Large,Medium,VanCountry = Korea.
5 14 .31 .09 .25100% .21 .40 .09 .0068%Much .00 .22 .00 .0011% .24 .43 .10 .0058% .15 .59 .00 .0032% .36 .23 .23 .0026% .43 .07 .21 .0016% .25 .50 .25 .009%Much .00 .11 .11 .7832%yesnoMuch worseworseaveragebetter (unused)Much betterA model with <- rpart (survived ., data = ptitanic, cp = .02) ( )Each node shows- the predicted class (died or survived),- the predicted probability of survival,- the percentage of observations in the model with acontinuousresponse (an anova model). <- rpart (Mileage ., data= ) ( )Each node shows- the predicted value,- the percentage of observations in the model with <- rpart (Reliability.)
6 , data = ) ( )Each node shows- the predicted class (Much worse, worse, .., Much better),- the predicted probability of each class,- the percentage of observations in the this example, the classbetteris never predicted bythe model and thus is markedunusedin the = 1to see the auto-positioned x,y coordinates of the position can be adjusted with 1 default arguments and different kinds of model3type = 0(default)sex = maleage >= >= 3dieddiedsurvivedsurvivedyesnotype = 1label all nodes(like all=TRUE)sex = maleage >= >= 3dieddieddiedsurviveddiedsurvivedsurvive dyesnotype = 2split labels below node labelssex = maleage >= >= 3dieddieddiedsurviveddiedsurvivedsurvive dyesnotype = 3left and right split labelssex = maleage >= >= 3female < < 3dieddiedsurvivedsurvivedtype = 4like type=3 but with interior labels(like fancy=TRUE)sex = maleage >= >= 3female < < 3dieddieddiedsurviveddiedsurvivedsurvivedtype = 5variable name in interior nodesmale >= >= 3female < < 3sexagediedsibspdiedsurvivedsurvivedFigu re 2 may also want to look (put the leaves at the bottom)
7 ,uniform(vertically space thenodes uniformly or proportionally to the fit), andshadow(add shadows to the node boxes). dealing with the many arguments ofprp, it helps to remember that the display has four constituents:thenode labels, thesplit labels, thebranch lines, and the optionalnode numbers. Each of these constituentshas a complete set ofcoletc. arguments. Thus we have, for example,col(the color of the node label text), (the split text), (the branch lines), (the optional node numbers).Standard graphics parameters such ascolcan be passed in So where the help page refersto thecolargument, what is meant is thecolargument passed in as , and if it is not passedin, the value ofpar("col").
8 Such parameters typically affect only the node labels, not the splitlabels orother constituents of the = 0 Girth < 16 Girth >= 16302356extra = 1nbr of obsGirth < 16 Girth >= 16n=31n=24n=7302356extra = 100percentage of obsGirth < 16 Girth >= 16100%77%23%302356extra = 101nbr and percentageof obsGirth < 16 Girth >= 16n=31 100%n=24 77%n=7 23%302356 Figure 3:Theextraargument with ananovamodel. Percentages are included by = 0sex = malefemaledieddiedsurvivedextra = 1nbr of obs per classsex = malefemale809 500682 161127 339dieddiedsurvivedextra = 2class ratesex = malefemale809 / 1309682 / 843339 / 466dieddiedsurvivedextra = 3misclass ratesex = malefemale500 / 1309161 / 843127 / 466dieddiedsurvivedextra = 4prob per class(sum across a node is 1)sex =.
9 73dieddiedsurvivedextra = 5prob per class,fitted class not displayedsex = ..73extra = 6prob of 2nd class(useful forbinary responses)sex = = 7prob of 2nd class,fitted class not displayedsex = = 8prob of fitted classsex = = 9overall prob(sum over all leaves is 1)sex = ..26dieddiedsurvivedextra = 10overall prob of 2nd classsex = = 11overall prob of 2nd classfitted class not displayedsex = = 100percent of obssex = malefemale100%64%36%dieddiedsurvivedextr a = 106prob of 2nd class andpercent of obssex = 100% 64% 36%dieddiedsurvivedFigure 4:Theextraargument with aclassmodel. This figure also illustratesunder = TRUE which puts theextra data under the Printing rules can be printed as a set of rules using the The rules are sometimesclearer or more convenient than the plotted example, we build a model to predict the volume of usable timberfrom cherry trees:data(trees)volume <- rpart (Volume ~.)
10 , data = trees) (volume, type = 3, = FALSE, branch = .3, under = TRUE) (volume)The resulting tree and rules (shown in blue) are:Girth < 16 Girth < 12 Girth >= 16 Girth >= 1248%29%23%183156 Volume18 when Girth < 1231 when Girth is 12 to 1656 when Girth >= 16We can see that therpartalgorithm discards theHeightvariable in thetreesdata, and estimates theVolumeby separating theGirthinto three partitions. Notice how the two conditions along the left side ofthe tree (Girth < 16andGirth < 12) are collapsed into the single ruleGirth < example is the probability of survival of Titanic passengers:data(ptitanic)survived <- rpart (survived ~.