Package ‘randomForest’

Package randomforest May 23, 2022 TitleBreiman and Cutler's Random Forests for Classification (>= ), statsSuggestsRColorBrewer, MASSA uthorFortran original by Leo Breiman and Adele Cutler, R port by Andy Liaw and Matthew and regression based on a forest of trees using random in-puts, based on Breiman (2001) < :1010933404324>.MaintainerAndy (>= 2) ~breiman/RandomForests/NeedsCompilationy esRepositoryCRANDate/Publication2022-05- 23 08:27:49 UTCR topics documented:classCenter ..2combine ..3getTree ..4grow ..5importance ..6imports85 ..7margin ..8 MDSplot .. 10outlier .. 11partialPlot .. 1512classCenterrandomForest .. 17rfcv .. 22rfImpute .. 23rfNews .. 25treesize .. 25tuneRF .. 26varImpPlot .. 27varUsed .. 28 Index29classCenterPrototypes of are representative cases of a group of data points, given the similarity matrix amongthe points.

They are very similar to medoids. The function is named classCenter to avoid conflictwith the functionprototypein (x, label, prox, nNbr = min(table(label))-1)Argumentsxa matrix or data framelabelgroup labels of the rows inxproxthe proximity (or similarity) matrix, assumed to be symmetric with 1 on thediagonal and in [0, 1] off the diagonal (the order of row/column must match thatofx)nNbrnumber of nearest neighbors used to find the version only computes one prototype per class. For each case inx, thenNbrnearest neighorsare found. Then, for each class, the case that has most neighbors of that class is identified. The pro-totype for that class is then the medoid of these neighbors (coordinate-wise medians for numericalvariables and modes for categorical variables).

This version only computes one prototype per class. In the future more prototypes may be computed(by removing the neighbors used, then iterate).ValueA data frame containing one prototype in each (s)Andy LiawSee AlsorandomForest,MDSplotExamplesdata(iri s) <- randomforest (iris[,-5], iris[,5], prox=TRUE) <- classCenter(iris[,-5], iris[,5], $prox)plot(iris[,3], iris[,4], pch=21, xlab=names(iris)[3], ylab=names(iris)[4],bg=c("red", "blue", "green")[ (factor(iris$Species))],main="Iris Data with Prototypes")points( [,3], [,4], pch=21, cex=2, bg=c("red", "blue", "green"))combineCombine Ensembles of TreesDescriptionCombine two more more ensembles of trees into (..) or more objects of classrandomForest, to be combined into object of , ,mseandrsqcomponents (as well as the corresponding components inthetestcompnent, if exist) of the combined object will (s)Andy AlsorandomForest,grow4getTreeExamplesdat a(iris)rf1 <- randomforest (Species ~.)

, iris, ntree=50, )rf2 <- randomforest (Species ~ ., iris, ntree=50, )rf3 <- randomforest (Species ~ ., iris, ntree=50, ) <- combine(rf1, rf2, rf3)print( )getTreeExtract a single tree from a function extract the structure of a tree from (rfobj, k=1, labelVar=FALSE) tree to extract?labelVarShould better labels be used for splitting variables and predicted class?DetailsFor numerical predictors, data with values of the variable less than or equal to the splitting point goto the left daughter categorical predictors, the splitting point is represented by an integer, whose binary expansiongives the identities of the categories that goes to left or right. For example, if a predictor hasfour categories, and the split point is 13. The binary expansion of 13 is (1, 0, 1, 1) (because13 = 1 20+ 0 21+ 1 22+ 1 23), so cases with categories 1, 3, or 4 in this predictor get sentto the left, and the rest to the matrix (or data frame, iflabelVar=TRUE) with six columns and number of rows equal to totalnumber of nodes in the tree.

The six columns are:left daughterthe row where the left daughter node is; 0 if the node is terminalright daughterthe row where the right daughter node is; 0 if the node is terminalsplit varwhich variable was used to split the node; 0 if the node is terminalsplit pointwhere the best split is; see Details for categorical predictorstatusis the node terminal (-1) or not (1)predictionthe prediction for the node; 0 if the node is not terminalgrow5 Author(s)Andy AlsorandomForestExamplesdata(iris)## Look at the third trees in the ( randomforest (iris[,-5], iris[,5], ntree=10), 3, labelVar=TRUE)growAdd trees to an ensembleDescriptionAdd additional trees to an existing ensemble of ## S3 method for class' randomforest 'grow(x, , ..)Argumentsxan object of classrandomForest, which contains of trees to add to object of classrandomForest, , ,mseandrsqcomponents (as well as the corresponding components inthetestcompnent, if exist) of the combined object will (s)Andy Alsocombine,randomForest6importanceExamp lesdata(iris) <- randomforest (Species ~.)

, iris, ntree=50, ) <- grow( , 50)print( )importanceExtract variable importance measureDescriptionThis is the extractor function for variable importance measures as produced ## S3 method for class' randomforest 'importance(x, type=NULL, class=NULL, scale=TRUE, ..)Argumentsxan object of 1 or 2, specifying the type of importance measure (1=mean decrease inaccuracy, 2=mean decrease in node impurity).classfor classification problem, which class-specific measure to permutation based measures, should the measures be divided their standarderrors ?..not are the definitions of the variable importance measures. The first measure is computed frompermuting OOB data: For each tree, the prediction error on the out-of-bag portion of the data isrecorded (error rate for classification, MSE for regression).

Then the same is done after permutingeach predictor variable. The difference between the two are then averaged over all trees, and nor-malized by the standard deviation of the differences. If the standard deviation of the differences isequal to 0 for a variable, the division is not done (but the average is almost always equal to 0 in thatcase).The second measure is the total decrease in node impurities from splitting on the variable, averagedover all trees. For classification, the node impurity is measured by the Gini index. For regression, itis measured by residual sum of matrix of importance measure, one row for each predictor variable. The column(s) are differentimportance AlsorandomForest, (4543)data(mtcars) <- randomforest (mpg ~ ., data=mtcars, ntree=1000, , importance=TRUE)importance( )importance( , type=1)imports85 The Automobile DataDescriptionThis is the Automobile data from the UCI Machine Learning (imports85)Formatimports85is a data frame with 205 cases (rows) and 26 variables (columns).

This data set consistsof three types of entities: (a) the specification of an auto in terms of various characteristics, (b)its assigned insurance risk rating, (c) its normalized losses in use as compared to other cars. Thesecond rating corresponds to the degree to which the auto is more risky than its price are initially assigned a risk factor symbol associated with its price. Then, if it is more risky(or less), this symbol is adjusted by moving it up (or down) the scale. Actuarians call this process symboling . A value of +3 indicates that the auto is risky, -3 that it is probably pretty third factor is the relative average loss payment per insured vehicle year. This value is normal-ized for all autos within a particular size classification (two-door small, station wagons, sports/speciality, ), and represents the average loss per car per (s)Andy LiawSourceOriginally created by Jeffrey C.

Schlimmer, from 1985 Model Import Car and Truck Specifica-tions, 1985 Ward s Automotive Yearbook, Personal Auto Manuals, Insurance Services Office, andInsurance Collision Report, Insurance Institute for Highway original data is ~ Model Import Car and Truck Specifications, 1985 Ward s Automotive Auto Manuals, Insurance Services Office, 160 Water Street, New York, NY 10038 Insurance Collision Report, Insurance Institute for Highway Safety, Watergate 600, Washington,DC 20037 See AlsorandomForestExamplesdata(imports85)i mp85 <- imports85[,-2] # Too many NAs in <- imp85[ (imp85), ]## Drop empty levels for [] <- lapply(imp85, function(x) if ( (x)) x[, drop=TRUE] else x)stopifnot(require( randomforest )) <- randomforest (price ~ ., imp85, , ntree=100)print( ) <- randomforest (numOfDoors ~.)

, imp85, , ntree=100)print( )marginMargins of randomforest ClassifierDescriptionCompute or plot the margin of predictions from a randomforest ## S3 method for class' randomforest 'margin(x, ..)## Default S3 method:margin(x, observed, ..)## S3 method for class'margin'plot(x, sort=TRUE, ..)Argumentsxan object of classrandomForest, whosetypeis notregression, or a matrix ofpredicted probabilities, one column per class and one row per observation. Fortheplotmethod,xshould be an object returned true response corresponding to the data the data be sorted by their class labels?..other graphical parameters to be passed , themarginof observations from therandomForestclassifier (or whatever classifierthat produced the predicted probability matrix given tomargin). The margin of a data point isdefined as the proportion of votes for the correct class minus maximum proportion of votes for theother classes.

Package ‘randomForest’

Tags:

Information

Advertisement

Transcription of Package ‘randomForest’

Related search queries

Package ‘randomForest’

Tags:

Information

Advertisement

Documents from same domain

Related documents

Related search queries