A Gentle Introduction to Gradient Boosting

A Gentle Introduction to Gradient BoostingCheng of Computer and Information ScienceNortheastern UniversityGradient BoostingIa powerful machine learning algorithmIit can doIregressionIclassificationIrankingIwon Track 1 of the Yahoo Learning to Rank ChallengeOur implementation of Gradient Boosting is available of the Tutorial1 What is Gradient Boosting2 A brief history3 Gradient Boosting for regression4 Gradient Boosting for classification5 A demo of Gradient Boosting6 Relationship between Adaboost and Gradient Boosting7 Why it worksNote: This tutorial focuses on the intuition. For a formaltreatment, see [Friedman, 2001]What is Gradient BoostingGradient Boosting = Gradient Descent + BoostingAdaboostFigure: AdaBoost.

Source: Figure of [Schapire and Freund, 2012]What is Gradient BoostingGradient Boosting = Gradient Descent + BoostingAdaboostFigure: AdaBoost. Source: Figure of [Schapire and Freund, 2012]IFit an additive model (ensemble) t tht(x) in a forwardstage-wise each stage, introduce a weak learner to compensate theshortcomings of existing weak Adaboost, shortcomings are identified by high-weight is Gradient BoostingGradient Boosting = Gradient Descent + BoostingAdaboostH(x) = t tht(x)Figure: AdaBoost. Source: Figure of [Schapire and Freund, 2012]What is Gradient BoostingGradient Boosting = Gradient Descent + BoostingGradient BoostingIFit an additive model (ensemble) t tht(x) in a forwardstage-wise each stage, introduce a weak learner to compensate theshortcomings of existing weak Gradient Boosting , shortcomings are identified that, in Adaboost, shortcomings are identified byhigh-weight data high-weight data points and gradients tell us how toimprove our is Gradient BoostingWhy and how did researchers invent Gradient Boosting ?

A Brief History of Gradient BoostingIInvent Adaboost, the first successful Boosting algorithm[Freund et al., 1996, Freund and Schapire, 1997]IFormulate Adaboost as Gradient descent with a special lossfunction[Breiman et al., 1998, Breiman, 1999]IGeneralize Adaboost to Gradient Boosting in order to handlea variety of loss functions[Friedman et al., 2000, Friedman, 2001] Gradient Boosting for RegressionGradient Boosting for Different ProblemsDifficulty:regression ===>classification ===>rankingGradient Boosting for RegressionLet s play a are given (x1,y1),(x2,y2),..,(xn,yn), and the task is to fit amodelF(x) to minimize square your friend wants to help you and gives you a check his model and find the model is good but not are some mistakes:F(x1) = , whiley1= , andF(x2) = whiley2= How can you improve this model?

Gradient Boosting for RegressionLet s play a are given (x1,y1),(x2,y2),..,(xn,yn), and the task is to fit amodelF(x) to minimize square your friend wants to help you and gives you a check his model and find the model is good but not are some mistakes:F(x1) = , whiley1= , andF(x2) = whiley2= How can you improve this model?Rule of the game:IYou are not allowed to remove anything fromFor change anyparameter Boosting for RegressionLet s play a are given (x1,y1),(x2,y2),..,(xn,yn), and the task is to fit amodelF(x) to minimize square your friend wants to help you and gives you a check his model and find the model is good but not are some mistakes:F(x1) = , whiley1= , andF(x2) = whiley2= How can you improve this model?

Rule of the game:IYou are not allowed to remove anything fromFor change anyparameter can add an additional model (regression tree)htoF, sothe new prediction will beF(x) +h(x). Gradient Boosting for RegressionSimple solution:You wish to improve the model such thatF(x1) +h(x1) =y1F(x2) +h(x2) = (xn) +h(xn) =ynGradient Boosting for RegressionSimple solution:Or, equivalently, you wishh(x1) =y1 F(x1)h(x2) =y2 F(x2)..h(xn) =yn F(xn) Gradient Boosting for RegressionSimple solution:Or, equivalently, you wishh(x1) =y1 F(x1)h(x2) =y2 F(x2)..h(xn) =yn F(xn)Can any regression treehachieve this goal perfectly? Gradient Boosting for RegressionSimple solution:Or, equivalently, you wishh(x1) =y1 F(x1)h(x2) =y2 F(x2)..h(xn) =yn F(xn)Can any regression treehachieve this goal perfectly?

Maybe Boosting for RegressionSimple solution:Or, equivalently, you wishh(x1) =y1 F(x1)h(x2) =y2 F(x2)..h(xn) =yn F(xn)Can any regression treehachieve this goal perfectly?Maybe some regression tree might be able to do this Boosting for RegressionSimple solution:Or, equivalently, you wishh(x1) =y1 F(x1)h(x2) =y2 F(x2)..h(xn) =yn F(xn)Can any regression treehachieve this goal perfectly?Maybe some regression tree might be able to do this Boosting for RegressionSimple solution:Or, equivalently, you wishh(x1) =y1 F(x1)h(x2) =y2 F(x2)..h(xn) =yn F(xn)Can any regression treehachieve this goal perfectly?Maybe some regression tree might be able to do this fit a regression treehto data(x1,y1 F(x1)),(x2,y2 F(x2)).

,(xn,yn F(xn)) Gradient Boosting for RegressionSimple solution:Or, equivalently, you wishh(x1) =y1 F(x1)h(x2) =y2 F(x2)..h(xn) =yn F(xn)Can any regression treehachieve this goal perfectly?Maybe some regression tree might be able to do this fit a regression treehto data(x1,y1 F(x1)),(x2,y2 F(x2)),..,(xn,yn F(xn))Congratulations, you get a better model! Gradient Boosting for RegressionSimple solution:yi F(xi) are calledresiduals. These are the parts that existingmodelFcannot do role ofhis to compensate the shortcoming of existing Boosting for RegressionSimple solution:yi F(xi) are calledresiduals. These are the parts that existingmodelFcannot do role ofhis to compensate the shortcoming of existing the new modelF+his still not satisfactory, Gradient Boosting for RegressionSimple solution:yi F(xi) are calledresiduals.

These are the parts that existingmodelFcannot do role ofhis to compensate the shortcoming of existing the new modelF+his still not satisfactory, we can add anotherregression Boosting for RegressionSimple solution:yi F(xi) are calledresiduals. These are the parts that existingmodelFcannot do role ofhis to compensate the shortcoming of existing the new modelF+his still not satisfactory, we can add anotherregression are improving the predictions of training data, is the procedurealso useful for test data? Gradient Boosting for RegressionSimple solution:yi F(xi) are calledresiduals. These are the parts that existingmodelFcannot do role ofhis to compensate the shortcoming of existing the new modelF+his still not satisfactory, we can add anotherregression are improving the predictions of training data, is the procedurealso useful for test data?

Yes! Because we are building a model, and the model can beapplied to test data as Boosting for RegressionSimple solution:yi F(xi) are calledresiduals. These are the parts that existingmodelFcannot do role ofhis to compensate the shortcoming of existing the new modelF+his still not satisfactory, we can add anotherregression are improving the predictions of training data, is the procedurealso useful for test data?Yes! Because we are building a model, and the model can beapplied to test data as is this related to Gradient descent? Gradient Boosting for RegressionGradient DescentMinimize a function by moving in the opposite direction of thegradient. i:= i J iFigure: Gradient Descent. Source: Boosting for RegressionHow is this related to Gradient descent?

Loss functionL(y,F(x)) = (y F(x))2/2We want to minimizeJ= iL(yi,F(xi)) by adjustingF(x1),F(x2),..,F(xn).Notice thatF(x1),F(x2),..,F(xn) are just some numbers. We cantreatF(xi) as parameters and take derivatives J F(xi)= iL(yi,F(xi)) F(xi)= L(yi,F(xi)) F(xi)=F(xi) yiSo we can interpret residuals as negative F(xi) = J F(xi) Gradient Boosting for RegressionHow is this related to Gradient descent?F(xi) :=F(xi) +h(xi)F(xi) :=F(xi) +yi F(xi)F(xi) :=F(xi) 1 J F(xi) i:= i J iGradient Boosting for RegressionHow is this related to Gradient descent?For regression withsquare loss,residual negative gradientfit h to residual fit h to negative gradientupdate F based on residual update F based on negative gradientGradient Boosting for RegressionHow is this related to Gradient descent?

A Gentle Introduction to Gradient Boosting

Tags:

Information

Transcription of A Gentle Introduction to Gradient Boosting

Related search queries

A Gentle Introduction to Gradient Boosting

Tags:

Information

Documents from same domain

Related documents

Related search queries