Example: bachelor of science

Prediction of consumer credit risk - Machine learning

Erofcompaniesorstartupscreatedinthe eldofmi-cro creditandp eertop eerlending,wetriedthroughthispro jecttobuildane cientto oltop eertop eerlendingmanagers, ,themainpurp oseofthispro jectistopredictifaconsumerwillexp erienceaseriousdelinquency(90daysorworse )duringthenexttwoyears(thusitisaclassi cationproblem).Thedatasetconsistsofrough ly100, delsweimplementedpresentaverygo o dpredictivep ower( ):theyareobtainedbycombiningtrees,b o otstrapandgradientb o ductionCreditanddefaultriskshaveb eenintheforefrontof nancialnewssincethesubprimemortgagecrisi sthatb ,p eoplerealizedthatoneofthemaincausesoftha tcrisiswasthatloansweregrantedtop eo-plewhoseriskpro lewasto ,inordertorestoretrustinthe nancesystemandtopreventthisfromhapp eningagain,banksandothercreditcompaniesh averecentlytriedtodevelopnewmo ,the nancializationofoureconomiesimpliesthatm oreandmorestake-holdersareinvolved,howev eritcanstillb everydi cultforsomep eople-eitherb erofp eertop eerl

CS229 Prediction of consumer credit risk Marie-Laure Charpignon mcharpig@stanford.edu Enguerrand Horel ehorel@stanford.edu Flora Tixier ftixier@stanford.edu

Tags:

  Machine, Risks, Direct, Learning, Consumer, Machine learning, Stanford, Consumer credit risk

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Prediction of consumer credit risk - Machine learning

1 Erofcompaniesorstartupscreatedinthe eldofmi-cro creditandp eertop eerlending,wetriedthroughthispro jecttobuildane cientto oltop eertop eerlendingmanagers, ,themainpurp oseofthispro jectistopredictifaconsumerwillexp erienceaseriousdelinquency(90daysorworse )duringthenexttwoyears(thusitisaclassi cationproblem).Thedatasetconsistsofrough ly100, delsweimplementedpresentaverygo o dpredictivep ower( ):theyareobtainedbycombiningtrees,b o otstrapandgradientb o ductionCreditanddefaultriskshaveb eenintheforefrontof nancialnewssincethesubprimemortgagecrisi sthatb ,p eoplerealizedthatoneofthemaincausesoftha tcrisiswasthatloansweregrantedtop eo-plewhoseriskpro lewasto ,inordertorestoretrustinthe nancesystemandtopreventthisfromhapp eningagain,banksandothercreditcompaniesh averecentlytriedtodevelopnewmo ,the nancializationofoureconomiesimpliesthatm oreandmorestake-holdersareinvolved,howev eritcanstillb everydi cultforsomep eople-eitherb erofp eertop eerlend-ingwebsites,MicroFinanceInstitut ions(MFI)

2 Andcompaniesthatbacktheirdevelopment,isc urrentlygrowingquickly,andthequiterecent sto ject ts,itsmaingoalistopredictifaconsumerwill exp erienceaseriousdelinquency(90daysorworse ) ,themetho dsandthemo delsusedwillb epre-sentedinsectionstwoandthree,thenthe re-sultswillb jectcomesfromthecomp etition"Givemesomecredit" ,269consumers,eachcharacterizedbythefoll owing10variables: ageoftheb orrower; numb erofdep endentsinfamily; monthlyincome; monthlyexp endituresdividedbymonthlygrossincome; totalbalanceoncreditcardsdividedbythesum ofcreditlimits; numb erofop enloansandlinesofcredit; numb erofmortgageandrealestateloans; numb eroftimestheb orrowerhasb een30-59dayspastduebutnoworseinthelast1C S229twoyears; numb eroftimestheb orrowerhasb een60-89dayspastduebutnoworseinthelasttw oyears; numb eroftimestheb orrowerhasb endentvariableisifap ersonexp erienced90dayspastduedelinquencyorworsei nthelasttwoyears(1ifyesand0ifnot).

3 CessingWhenwelo okedinitiallyatthedata,wethoughtthatthey certainlyshouldnotallb orrowerdo esnotseemsoimp ortant,andthelastthreevariableslo ,wetestedthesigni othrevealedthatthevariable"balanceoncred itcardsdi-videdbysumofcreditlimits"wasno treallysigni nalresults, , "numb eroftimestheb orrowerhasb eensomedayspastdueinthelasttwoyears"wast he rstprinci-palcomp onent,andthattwoothercombina-tionsofthes amevariableswerethelasttwocomp onentswhoseasso rmedtheintuitionthatthesethreevariablesc ouldb eredundantiftheywerenotconsideredintheri ghtprop rsteightprin-cipalcomp , ,theprop ortionofp ositiveoutputs(consumerswhohadadefault)w asonly6%.

4 Aswewantedtopredictifap ersonwouldex-p erienceadelinquency,wethoughtitcouldincr easethepredictivep owerofourmo delstotrainthemonadatasetwheretheprop ortionofp ortionto30% delsClassi cationtreesareappropriateforthisproblem, ondstoanintuitiverep-resentationofthecon sumers,eachoneb eingasso ciatedwithaclusterlinkedtoitscreditpro erentmo dels: Logisticregressionasitisaveryclassicmo delforthistyp eofproblems; Classi cationandRegressionTrees(CART):wereadint heliteraturethattreeswereparticularlye cientinclassi- cation; RandomForests:thismo delaveragesmultipledeepdecisiontreestrai nedondi erentpartsofthetrainingset(thisaimsatred ucingthevariance); GradientBo ostingTrees(GBT):gradi-entb o ,eachtreeintheseriesis ttedwiththepurp.

5 , (x) =J j=1bjm1(x Rjm) dateruleofthemo delis jm= argmin xi RjmL(yi,Fm 1(xi)+ hm(xi))2CS229whereLisalossfunction(theMS Eforinstance).Thus,Fm(x) =Fm 1(x) +J j=1 jm1(x Rjm) dsToassessandcomparetheprecisionofourmo dels,werealizedthatwecouldnotusetheclass icerrormeasure(numb erofwrongpre-dictionoverthetotalnumb erofpredictions)asthemo delsimplementedtendtounderesti-matethepr op ortionofp :AUCandF1score,astheyarecomplementaryand b othadaptedtobinaryclassi os-itiverateversusthefalsep ositiverateandF1-scoreistheharmonicmeanb etweenprecision(prop ortionsofp ositiveandnegativeresultsthataretruep ositiveandtruenegative)andrecall(truep ositiverate).

6 Thesetwometricsareb etween0and1andthebiggertheyare,theb ettertheasso ciatedmo ortionofp ositiveoutputsisincreasedinthetrainingse tandthenthetrainedmo :TrainingandtestingerrorwiththeAUCmetric 3CS229 Figure2:TrainingandtestingerrorwiththeF1 metricComparisonreferencesAsexplainedint heprevioussection,wedecidedtoimplementaL ogitmo delinordertohavesomereferencetowhichweco uldcom-paretheresultsfromtheotherthreemo dels,b ,Logitisknowntob eoneofthemostappropriateal-gorithmsforcl assi okingatthetestingandtrainingresultsforth eAUCmetric,wecanclearlystatethattwodisti nctgroupsofmo delsapp ear:LogitandCART constitutethe rstone;themoresophisticatedtreemo erformanceisquitesimilarfortestingandtra ining, ,F1-scoreintro ducesabiggergapb , ,italsoindicatesthatGBTistheb estmo estmo estcom-p etitorsoftheKagglecomp ,ourmo delsaree cient,fortwoma jorreasons:the rstoneisthatthestructureoftreesisadapted toclassi cationproblems.

7 Andthesecondoneisthattheyaresophisti-cat ed,comparedwiththebasicCART,astheyinvolv estatisticalandmachinelearningtech-nique ssuchasb o otstraporGradientBo ectthatsurprisedusalotwasthefactthatRand omForesthighlyover- ts:itisastonishingb ecauseitisnotwhatisexp ectedfromthismo ,itisindeedsupp del(thenumb eroftrees)andthesameresulthasb eenobtainedfordif-4CS229ferentvaluesofit :theproblemmaycomefromourdatabase,andone p ossibleimprovementcouldb o ostingtechnique(GBTmo del),wehaveimplementedamo ,itspredictivep ,GBTb eatstheothermo delsweusedandesp eciallyLogit(whichwasourreference).Secon d,itssmallvariancemakesitreliable:unlike RandomForest,itstrainingandtestingerrors areonthesamescalewhichmeansthatitdo esnottendtoover ject,wethoughtab einterestingtotrytoaddnewvari-ables(asso ciatedwithsomecharacteristicsoftheloanfo rinstance)

8 Andseeifitimprovesthepredictivep erformancesofthemo , ,accordingtotheliterature,neuralnetworks o erverygo o dp ,comparingitspredictivep owerwiththeoneofourmo delscouldallowustoputourresultsintomorep ersp ,wecoulddevelopacreditriskmanagementto olforp eertop eertop eerlend-ingcompanyconnectsb orrowersandlenders,thelatterb einginvestorslo okingforcertainreturnsandriskratiosbased ontheirriskpro- eusedtocreatep ortfoliosofloansinordertodiversifytheirr iskandtohelpin-vestorsreachingtheirsp eci [1]BrownI.,MuesC.,Anexperimentalcom-pari sonofclassi cationalgorithmsforim-balancedcreditscor ingdatasets,(Exp ertSystemswithApplicaions#39,3446-3453,2 012.)

9 [2] ,DataMiningforImbalancedDatasets:anOverv iew(Springer,853-867,2005.)[3]GalindoJ., TamayoP.,CreditRiskAss-mentUsingStatisti calandMachineLearn-ing:BasicMethodologya ndRiskModel-ingApplications(Computationa lEconomics#15,107-143,2000.)[4] , ,StatisticalClassi cationMethodsinConsumerCreditScoring:aRe view( c.#160,523-541,1997.)[5] , , ,Consumercredit-riskmodelsviamachine-lea rningalgorithms.(JournalofBanking&Financ e#34,2767-2787,2010.)[6] ,Asurveyofcreditandbe-haviouralscoring:f orecasting nancialriskoflendingtoconsumers,(Interna tionalJour-nalofForecasting#16,149-172,2 000.)5


Related search queries