Example: bankruptcy

XGBoost: A Scalable Tree Boosting System

XGBoost: A Scalable Tree Boosting SystemTianqi ChenUniversity of GuestrinUniversity of Boosting is a highly effective and widely used machinelearning method. In this paper, we describe a Scalable end-to-end tree Boosting System called XGBoost, which is usedwidely by data scientists to achieve state-of-the-art resultson many machine learning challenges. We propose a novelsparsity-aware algorithm for sparse data and weighted quan-tile sketch for approximate tree learning. More importantly,we provide insights on cache access patterns, data compres-sion and sharding to build a Scalable tree Boosting combining these insights, XGBoost scales beyond billionsof examples using far fewer resources than existing Machine Learning1.

protect banks from malicious attackers; anomaly event de-tection systems help experimental physicists to nd events that lead to new physics. There are two important factors that drive these successful applications: usage of e ective (statistical) models that capture the complex data depen-dencies and scalable learning systems that learn the model

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of XGBoost: A Scalable Tree Boosting System

1 XGBoost: A Scalable Tree Boosting SystemTianqi ChenUniversity of GuestrinUniversity of Boosting is a highly effective and widely used machinelearning method. In this paper, we describe a Scalable end-to-end tree Boosting System called XGBoost, which is usedwidely by data scientists to achieve state-of-the-art resultson many machine learning challenges. We propose a novelsparsity-aware algorithm for sparse data and weighted quan-tile sketch for approximate tree learning. More importantly,we provide insights on cache access patterns, data compres-sion and sharding to build a Scalable tree Boosting combining these insights, XGBoost scales beyond billionsof examples using far fewer resources than existing Machine Learning1.

2 INTRODUCTIONM achine learning and data-driven approaches are becom-ing very important in many areas. Smart spam classifiersprotect our email by learning from massive amounts of spamdata and user feedback; advertising systems learn to matchthe right ads with the right context; fraud detection systemsprotect banks from malicious attackers; anomaly event de-tection systems help experimental physicists to find eventsthat lead to new physics. There are two important factorsthat drive these successful applications: usage of effective(statistical) models that capture the complex data depen-dencies and Scalable learning systems that learn the modelof interest from large the machine learning methods used in practice,gradient tree Boosting [10]1is one technique that shinesin many applications.

3 Tree Boosting has been shown togive state-of-the-art results on many standard classificationbenchmarks [16]. LambdaMART [5], a variant of tree boost-ing for ranking, achieves state-of-the-art result for ranking1 Gradient tree Boosting is also known as gradient boostingmachine (GBM) or gradient boosted regression tree (GBRT)Permission to make digital or hard copies of part or all of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for third-party components of this work must be all other uses, contact the owner/author(s).

4 KDD 16, August 13-17, 2016, San Francisco, CA, USAc 2016 Copyright held by the owner/author(s).ACM ISBN .DOI:problems. Besides being used as a stand-alone predictor, itis also incorporated into real-world production pipelines forad click through rate prediction [15]. Finally, it is the de-facto choice of ensemble method and is used in challengessuch as the Netflix prize [3].In this paper, we describe XGBoost, a Scalable machinelearning System for tree Boosting . The System is available asan open source package2. The impact of the System has beenwidely recognized in a number of machine learning and datamining challenges. Take the challenges hosted by the ma-chine learning competition site Kaggle for example.

5 Amongthe 29 challenge winning solutions3published at Kaggle sblog during 2015, 17 solutions used XGBoost. Among thesesolutions, eight solely used XGBoost to train the model,while most others combined XGBoost with neural nets in en-sembles. For comparison, the second most popular method,deep neural nets, was used in 11 solutions. The successof the System was also witnessed in KDDCup 2015, whereXGBoost was used by every winning team in the , the winning teams reported that ensemble meth-ods outperform a well-configured XGBoost by only a smallamount [1].These results demonstrate that our System gives state-of-the-art results on a wide range of problems.

6 Examples ofthe problems in these winning solutions include: store salesprediction; high energy physics event classification; web textclassification; customer behavior prediction; motion detec-tion; ad click through rate prediction; malware classification;product categorization; hazard risk prediction; massive on-line course dropout rate prediction. While domain depen-dent data analysis and feature engineering play an importantrole in these solutions, the fact that XGBoost is the consen-sus choice of learner shows the impact and importance ofour System and tree most important factor behind the success of XGBoostis its scalability in all scenarios. The System runs more thanten times faster than existing popular solutions on a singlemachine and scales to billions of examples in distributed ormemory-limited settings.

7 The scalability of XGBoost is dueto several important systems and algorithmic innovations include: a novel tree learning algorithmis for handlingsparse data; a theoretically justified weightedquantile sketch procedure enables handling instance weightsin approximate tree learning. Parallel and distributed com-puting makes learning faster which enables quicker model ex-ploration. More importantly, XGBoost exploits out-of-core2 come from of top-3 teams of each [ ] 10 Jun 2016computation and enables data scientists to process hundredmillions of examples on a desktop. Finally, it is even moreexciting to combine these techniques to make an end-to-endsystem that scales to even larger data with the least amountof cluster resources.

8 The major contributions of this paperis listed as follows: We design and build a highly Scalable end-to-end treeboosting System . We propose a theoretically justified weighted quantilesketch for efficient proposal calculation. We introduce a novel sparsity-aware algorithm for par-allel tree learning. We propose an effective cache-aware block structurefor out-of-core tree there are some existing works on parallel tree boost-ing [22, 23, 19], the directions such as out-of-core compu-tation, cache-aware and sparsity-aware learning have notbeen explored. More importantly, an end-to-end systemthat combines all of these aspects gives a novel solution forreal-world use-cases.

9 This enables data scientists as well asresearchers to build powerful variants of tree Boosting al-gorithms [7, 8]. Besides these major contributions, we alsomake additional improvements in proposing a regularizedlearning objective, which we will include for remainder of the paper is organized as follows. Wewill first review tree Boosting and introduce a regularizedobjective in Sec. 2. We then describe the split finding meth-ods in Sec. 3 as well as the System design in Sec. 4, includingexperimental results when relevant to provide quantitativesupport for each optimization we describe. Related workis discussed in Sec. 5. Detailed end-to-end evaluations areincluded in Sec.

10 6. Finally we conclude the paper in Sec. TREE Boosting IN A NUTSHELLWe review gradient tree Boosting algorithms in this sec-tion. The derivation follows from the same idea in existingliteratures in gradient Boosting . Specicially the second ordermethod is originated from Friedman et al. [12]. We make mi-nor improvements in the reguralized objective, which werefound helpful in Regularized Learning ObjectiveFor a given data set withnexamples andmfeaturesD={(xi,yi)}(|D|=n,xi Rm,yi R), a tree ensem-ble model (shown in Fig. 1) usesKadditive functions topredict the output. yi= (xi) =K k=1fk(xi), fk F,(1)whereF={f(x) =wq(x)}(q:Rm T,w RT) is thespace of regression trees (also known as CART).


Related search queries