XGBoost: A Scalable Tree Boosting System

xgboost : A Scalable tree Boosting SystemTianqi ChenUniversity of GuestrinUniversity of Boosting is a highly effective and widely used machinelearning method. In this paper, we describe a Scalable end-to-end tree Boosting System called xgboost , which is usedwidely by data scientists to achieve state-of-the-art resultson many machine learning challenges. We propose a novelsparsity-aware algorithm for sparse data and weighted quan-tile sketch for approximate tree learning. More importantly,we provide insights on cache access patterns, data compres-sion and sharding to build a Scalable tree Boosting combining these insights, xgboost scales beyond billionsof examples using far fewer resources than existing Concepts Methodologies Machine learning; Informationsystems Data mining;KeywordsLarge-scale Machine Learning1. INTRODUCTIONM achine learning and data-driven approaches are becom-ing very important in many areas. Smart spam classifiersprotect our email by learning from massive amounts of spamdata and user feedback; advertising systems learn to matchthe right ads with the right context; fraud detection systemsprotect banks from malicious attackers; anomaly event de-tection systems help experimental physicists to find eventsthat lead to new physics.

There are two important factorsthat drive these successful applications: usage of effective(statistical) models that capture the complex data depen-dencies and Scalable learning systems that learn the modelof interest from large the machine learning methods used in practice,gradient tree Boosting [9]1is one technique that shines in1 Gradient tree Boosting is also known as gradient boostingmachine (GBM) or gradient boosted regression tree (GBRT)Permission to make digital or hard copies of part or all of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for third-party components of this work must be all other uses, contact the owner/author(s).c 2016 Copyright held by the owner/author(s).ACM ISBN .DOI:many applications. tree Boosting has been shown to givestate-of-the-art results on many standard classification bench-marks [14].

LambdaMART [4], a variant of tree Boosting forranking, achieves state-of-the-art result for ranking prob-lems. Besides being used as a stand-alone predictor, it isalso incorporated into real-world production pipelines for adclick through rate prediction [13]. Finally, it is the de-factochoice of ensemble method and is used in challenges such asthe Netflix prize [2].In this paper, we describe xgboost , a Scalable machinelearning System for tree Boosting . The System is available asan open source package2. The impact of the System has beenwidely recognized in a number of machine learning and datamining challenges. Take the challenges hosted by the ma-chine learning competition site Kaggle for example. Amongthe 29 challenge winning solutions3published at Kaggle sblog during 2015, 17 solutions used xgboost . Among thesesolutions, eight solely used xgboost to train the model,while most others combined xgboost with neural nets in en-sembles. For comparison, the second most popular method,deep neural nets, was used in 11 solutions.

The successof the System was also witnessed in KDDCup 2015, whereXGBoost was used by every winning team in the , the winning teams reported that ensemble meth-ods outperform a well-configured xgboost by only a smallamount [1].These results demonstrate that our System gives state-of-the-art results on a wide range of problems. Examples ofthe problems in these winning solutions include: store salesprediction; high energy physics event classification; web textclassification; customer behavior prediction; motion detec-tion; ad click through rate prediction; malware classification;product categorization; hazard risk prediction; massive on-line course dropout rate prediction. While domain depen-dent data analysis and feature engineering play an importantrole in these solutions, the fact that xgboost is the consen-sus choice of learner shows the impact and importance ofour System and tree most important factor behind the success of XGBoostis its scalability in all scenarios.

The System runs more thanten times faster than existing popular solutions on a singlemachine and scales to billions of examples in distributed ormemory-limited settings. The scalability of xgboost is dueto several important systems and algorithmic innovations include: a novel tree learning algorithm2 solutions come from of top-3 teams of each :submit/1502704 [ ] 9 Mar 2016 Figure 1: tree Ensemble Model. The final predic-tion for a given example is the sum of predictionsfrom each for handlingsparse data; a theoretically justified weightedquantile sketch procedure enables handling instance weightsin approximate tree learning. Parallel and distributed com-puting makes learning faster which enables quicker model ex-ploration. More importantly, xgboost exploits out-of-corecomputation and enables data scientists to process hundredmillions of examples on a desktop. Finally, it is even moreexciting to combine these techniques to make an end-to-endsystem that scales to even larger data with the least amountof cluster resources.

The major contributions of this paperis listed as follows: We design and build a highly Scalable end-to-end treeboosting System . We introduce a novel sparsity-aware algorithm for par-allel tree learning. We propose a theoretically justified weighted quantilesketch for efficient proposal calculation. We propose an effective cache-aware block structurefor out-of-core tree there are some existing works on parallel tree boost-ing [19, 20, 16], the directions such as out-of-core compu-tation, cache-aware and sparsity-aware learning have notbeen explored. More importantly, an end-to-end systemthat combines all of these aspects gives a novel solution forreal-world use-cases. This enables data scientists as wellas researchers to build powerful variants of tree boostingalgorithms [6, 7]. Besides these major contributions, wealso make additional improvements in proposing a regular-ized learning objective and supporting column sub-sampling,which we will include for remainder of the paper is organized as follows.

Wewill first review tree Boosting and introduce a general regu-larized objective in Sec. 2. We then describe the split findingalgorithms in Sec. 3 as well as the System design in Sec. 4, in-cluding experimental results when relevant to provide quan-titative support for each optimization we describe. Relatedwork is discussed in Sec. 5. Detailed end-to-end evaluationsof the System are included in Sec. 6. Finally we concludethe paper in Sec. tree Boosting IN A Regularized Learning ObjectiveFor a given data set withnexamples andmfeaturesD={(xi,yi)}(|D|=n,xi Rm,yi R), a tree ensem-ble model (shown in Fig. 1) usesKadditive functions topredict the output. yi= (xi) =K k=1fk(xi), fk F,(1)whereF={f(x) =wq(x)}(q:Rm T,w RT) is thespace of regression trees (also known as CART). Hereqrep-resents the structure of each tree that maps an example tothe corresponding leaf the number of leaves in thetree. Eachfkcorresponds to an independent tree structureqand leaf weightsw.

Unlike decision trees, each regressiontree contains a continuous score on each of the leaf, we usewito represent score oni-th leaf. For a given example, wewill use the decision rules in the trees (given byq) to classifyit into the leaves and calculate the final prediction by sum-ming up the score in the corresponding leaves (given byw).To learn the set of functions used in the model, we minimizethe ( ) = il( yi,yi) + k (fk)where (f) = T+12 w 2(2)Herelis a differentiable convex loss function that measuresthe difference between the prediction yiand the second term penalizes the complexity of the model( , the regression tree functions). The additional regular-ization term helps to smooth the final learnt weights to avoidover-fitting. Intuitively, the regularized objective will tendto select a model employing simple and predictive similar regularization technique has been used in Regu-larized greedy forest (RGF) [22] model. Our objective andthe corresponding learning algorithm is simpler than RGFand easier to parallelize.

When the regularization parame-ter is set to zero, the objective falls back to the traditionalgradient tree Gradient tree BoostingThe tree ensemble model in Eq. (2) includes functions asparameters and cannot be optimized using traditional opti-mization methods in Euclidean space. Instead, the modelis trained in an additive manner. Formally, let y(t)ibe theprediction of thei-th instance at thet-th iteration, we willneed to addftto minimize the following (t)=n i=1l(yi, yi(t 1)+ft(xi)) + (ft)This means we greedily add theftthat most improves ourmodel according to Eq. (2).p To quickly optimize the ob-jective in the general setting, we approximate it using thesecond order Taylor (t)'n i=1[l(yi, y(t 1)) +gift(xi) +12hif2t(xi)] + (ft)wheregi= y(t 1)l(yi, y(t 1)) andhi= 2 y(t 1)l(yi, y(t 1))are first and second order gradient statistics on the loss func-tion. We can remove the constant terms to obtain the fol-lowing simplified objective at stept. L(t)=n i=1[gift(xi) +12hif2t(xi)] + (ft)(3)Figure 2: Structure Score Calculation.

We onlyneed to sum up the gradient and second order gra-dient statistics on each leaf, then apply the scoringformula to get the quality {i|q(xi) =j}as the instance set of leafj. Wecan rewrite Eq (3) by expanding the regularization term as follows L(t)=n i=1[gift(xi) +12hif2t(xi)] + T+12 T j=1w2j=T j=1[( i Ijgi)wj+12( i Ijhi+ )w2j] + T(4)For a fixed structureq(x), we can compute the optimalweightw jof leafjbyw j= i Ijgi i Ijhi+ ,(5)and calculate the corresponding optimal objective functionvalue by L(t)(q) = 12T j=1( i Ijgi)2 i Ijhi+ + T.(6)Eq (6) can be used as a scoring function to measure thequality of a tree structureq. This score is like the impurityscore for evaluating decision trees, except that it is derivedfor a wider range of objective functions. Fig. 2 illustrateshow this score can be it is impossible to enumerate all the possibletree structuresq. A greedy algorithm that starts from asingle leaf and iteratively adds branches to the tree is usedinstead.

XGBoost: A Scalable Tree Boosting System

Tags:

Information

Transcription of XGBoost: A Scalable Tree Boosting System

Related search queries

XGBoost: A Scalable Tree Boosting System

Tags:

Information

Documents from same domain

Related documents

Related search queries