Example: bankruptcy

A Gentle Introduction to Gradient Boosting - Cheng Li's ...

A Gentle Introduction to Gradient BoostingCheng of Computer and Information ScienceNortheastern UniversityGradient BoostingIa powerful machine learning algorithmIit can doIregressionIclassificationIrankingIwon Track 1 of the Yahoo Learning to Rank ChallengeOur implementation of Gradient Boosting is available of the Tutorial1 What is Gradient Boosting2 A brief history3 Gradient Boosting for regression4 Gradient Boosting for classification5 A demo of Gradient Boosting6 Relationship between Adaboost and Gradient Boosting7 Why it worksNote: This tutorial focuses on the intuition.

A Gentle Introduction to Gradient Boosting Cheng Li chengli@ccs.neu.edu College of Computer and Information Science Northeastern University

Tags:

  Introduction, University, Northeastern, Boosting, Derating, Gentle, Northeastern university, Gentle introduction to gradient boosting

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of A Gentle Introduction to Gradient Boosting - Cheng Li's ...

1 A Gentle Introduction to Gradient BoostingCheng of Computer and Information ScienceNortheastern UniversityGradient BoostingIa powerful machine learning algorithmIit can doIregressionIclassificationIrankingIwon Track 1 of the Yahoo Learning to Rank ChallengeOur implementation of Gradient Boosting is available of the Tutorial1 What is Gradient Boosting2 A brief history3 Gradient Boosting for regression4 Gradient Boosting for classification5 A demo of Gradient Boosting6 Relationship between Adaboost and Gradient Boosting7 Why it worksNote: This tutorial focuses on the intuition.

2 For a formaltreatment, see [Friedman, 2001]What is Gradient BoostingGradient Boosting = Gradient Descent + BoostingAdaboostFigure: AdaBoost. Source: Figure of [Schapire and Freund, 2012]What is Gradient BoostingGradient Boosting = Gradient Descent + BoostingAdaboostFigure: AdaBoost. Source: Figure of [Schapire and Freund, 2012]IFit an additive model (ensemble) t tht(x) in a forwardstage-wise each stage, introduce a weak learner to compensate theshortcomings of existing weak Adaboost, shortcomings are identified by high-weight is Gradient BoostingGradient Boosting = Gradient Descent + BoostingAdaboostH(x) = t tht(x)Figure: AdaBoost.

3 Source: Figure of [Schapire and Freund, 2012]What is Gradient BoostingGradient Boosting = Gradient Descent + BoostingGradient BoostingIFit an additive model (ensemble) t tht(x) in a forwardstage-wise each stage, introduce a weak learner to compensate theshortcomings of existing weak Gradient Boosting , shortcomings are identified that, in Adaboost, shortcomings are identified byhigh-weight data high-weight data points and gradients tell us how toimprove our is Gradient BoostingWhy and how did researchers invent Gradient Boosting ?A Brief History of Gradient BoostingIInvent Adaboost, the first successful Boosting algorithm[Freund et al.]

4 , 1996, Freund and Schapire, 1997]IFormulate Adaboost as Gradient descent with a special lossfunction[Breiman et al., 1998, Breiman, 1999]IGeneralize Adaboost to Gradient Boosting in order to handlea variety of loss functions[Friedman et al., 2000, Friedman, 2001] Gradient Boosting for RegressionGradient Boosting for Different ProblemsDifficulty:regression ===>classification ===>rankingGradient Boosting for RegressionLet s play a are given (x1,y1),(x2,y2),..,(xn,yn), and the task is to fit amodelF(x) to minimize square your friend wants to help you and gives you a check his model and find the model is good but not are some mistakes:F(x1) = , whiley1= , andF(x2) = whiley2= How can you improve this model?

5 Gradient Boosting for RegressionLet s play a are given (x1,y1),(x2,y2),..,(xn,yn), and the task is to fit amodelF(x) to minimize square your friend wants to help you and gives you a check his model and find the model is good but not are some mistakes:F(x1) = , whiley1= , andF(x2) = whiley2= How can you improve this model?Rule of the game:IYou are not allowed to remove anything fromFor change anyparameter Boosting for RegressionLet s play a are given (x1,y1),(x2,y2),..,(xn,yn), and the task is to fit amodelF(x) to minimize square your friend wants to help you and gives you a check his model and find the model is good but not are some mistakes:F(x1) = , whiley1= , andF(x2) = whiley2= How can you improve this model?

6 Rule of the game:IYou are not allowed to remove anything fromFor change anyparameter can add an additional model (regression tree)htoF, sothe new prediction will beF(x) +h(x). Gradient Boosting for RegressionSimple solution:You wish to improve the model such thatF(x1) +h(x1) =y1F(x2) +h(x2) = (xn) +h(xn) =ynGradient Boosting for RegressionSimple solution:Or, equivalently, you wishh(x1) =y1 F(x1)h(x2) =y2 F(x2)..h(xn) =yn F(xn) Gradient Boosting for RegressionSimple solution:Or, equivalently, you wishh(x1) =y1 F(x1)h(x2) =y2 F(x2)..h(xn) =yn F(xn)Can any regression treehachieve this goal perfectly?

7 Gradient Boosting for RegressionSimple solution:Or, equivalently, you wishh(x1) =y1 F(x1)h(x2) =y2 F(x2)..h(xn) =yn F(xn)Can any regression treehachieve this goal perfectly?Maybe Boosting for RegressionSimple solution:Or, equivalently, you wishh(x1) =y1 F(x1)h(x2) =y2 F(x2)..h(xn) =yn F(xn)Can any regression treehachieve this goal perfectly?Maybe some regression tree might be able to do this Boosting for RegressionSimple solution:Or, equivalently, you wishh(x1) =y1 F(x1)h(x2) =y2 F(x2)..h(xn) =yn F(xn)Can any regression treehachieve this goal perfectly?Maybe some regression tree might be able to do this Boosting for RegressionSimple solution:Or, equivalently, you wishh(x1) =y1 F(x1)h(x2) =y2 F(x2).

8 H(xn) =yn F(xn)Can any regression treehachieve this goal perfectly?Maybe some regression tree might be able to do this fit a regression treehto data(x1,y1 F(x1)),(x2,y2 F(x2)),..,(xn,yn F(xn)) Gradient Boosting for RegressionSimple solution:Or, equivalently, you wishh(x1) =y1 F(x1)h(x2) =y2 F(x2)..h(xn) =yn F(xn)Can any regression treehachieve this goal perfectly?Maybe some regression tree might be able to do this fit a regression treehto data(x1,y1 F(x1)),(x2,y2 F(x2)),..,(xn,yn F(xn))Congratulations, you get a better model! Gradient Boosting for RegressionSimple solution:yi F(xi) are calledresiduals.

9 These are the parts that existingmodelFcannot do role ofhis to compensate the shortcoming of existing Boosting for RegressionSimple solution:yi F(xi) are calledresiduals. These are the parts that existingmodelFcannot do role ofhis to compensate the shortcoming of existing the new modelF+his still not satisfactory, Gradient Boosting for RegressionSimple solution:yi F(xi) are calledresiduals. These are the parts that existingmodelFcannot do role ofhis to compensate the shortcoming of existing the new modelF+his still not satisfactory, we can add anotherregression Boosting for RegressionSimple solution:yi F(xi) are calledresiduals.

10 These are the parts that existingmodelFcannot do role ofhis to compensate the shortcoming of existing the new modelF+his still not satisfactory, we can add anotherregression are improving the predictions of training data, is the procedurealso useful for test data? Gradient Boosting for RegressionSimple solution:yi F(xi) are calledresiduals. These are the parts that existingmodelFcannot do role ofhis to compensate the shortcoming of existing the new modelF+his still not satisfactory, we can add anotherregression are improving the predictions of training data, is the procedurealso useful for test data?


Related search queries