Example: bankruptcy

CHI-SQUARE: TESTING FOR GOODNESS OF FIT

CHI-SQUARE: TESTINGFORGOODNESSOFFITIn thepreviouschapterwe discussedproceduresfor ttinga hypothesizedfunctionto a setof proceduresinvolve minimizinga quantity wecalled in orderto determinebestestimatesforcertainfunction parameters,such as (fora straight line)a slope andanintercept. is proportionalto (orin somecasesequalto)a statisticalmeasurecalled 2, orchi- square , a quantity commonlyusedto testwhetherany givendataarewelldescribedby a determinationiscalledachi- square testforgoodnessof thefollowing,we discuss 2anditsstatisticaldistribution,andshow how it canbeusedas a testforgoodnessof 2If independent variablesxiareeach normallydistributedwithmean iandvariance 2i,thenthequantity knownaschi-square2is de nedby 2 (x1 1)2 21+(x2 2)2 22+ +(x )2 2 = Xi=1(xi i)2 2i(1)Notethatideally, giventherandom uctuationsof thevaluesofxiabouttheirmeanvalues i, each termin thesumwillbe of orderunity.

CHI-SQUARE: TESTING FOR GOODNESS OF FIT In the previous chapter we discussed procedures for tting a hypothesized function to a set of experimental data points.

Tags:

  Square, Testing, Chi square, Goodness, Testing for goodness of

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of CHI-SQUARE: TESTING FOR GOODNESS OF FIT

1 CHI-SQUARE: TESTINGFORGOODNESSOFFITIn thepreviouschapterwe discussedproceduresfor ttinga hypothesizedfunctionto a setof proceduresinvolve minimizinga quantity wecalled in orderto determinebestestimatesforcertainfunction parameters,such as (fora straight line)a slope andanintercept. is proportionalto (orin somecasesequalto)a statisticalmeasurecalled 2, orchi- square , a quantity commonlyusedto testwhetherany givendataarewelldescribedby a determinationiscalledachi- square testforgoodnessof thefollowing,we discuss 2anditsstatisticaldistribution,andshow how it canbeusedas a testforgoodnessof 2If independent variablesxiareeach normallydistributedwithmean iandvariance 2i,thenthequantity knownaschi-square2is de nedby 2 (x1 1)2 21+(x2 2)2 22+ +(x )2 2 = Xi=1(xi i)2 2i(1)Notethatideally, giventherandom uctuationsof thevaluesofxiabouttheirmeanvalues i, each termin thesumwillbe of orderunity.

2 Hence,if we have chosenthe iandthe icorrectly, we may expectthata calculatedvalueof 2willbe approximatelyequalto . If it is, thenwe may concludethatthedataarewelldescribedby thevalueswe havechosenforthe i, thatis, by a calculatedvalueof 2turnsoutto be much largerthan , andwe have correctlyestimatedthevaluesforthe i, we may possiblyconcludethatourdataarenotwell-de scribedby ourhypothesizedsetof the thegeneralideaof the whatfollowswe spelloutthedetailsof itsuseas a statisticaltestarealsodescribed in thereferenceslistedat theendof 2is is a singlestatisticalvariable,andnotthesquar eof somequantity . It is thereforenotchisquared, butchi- square .

3 Thenotationismerelysuggestive of itsconstructionas thesumof squaresof wouldhave beenbetter,historically, to have calledit or .4 { 14 { 2 Chi- square : Testingforgoodnessof tThe 2distributionThequantity 2de nedin hastheprobability distributiongivenbyf( 2) =12 =2 ( =2)e 2=2( 2)( =2) 1(2)Thisis knownas the 2-distributionwith degrees of freedom. is a positive writeit asf( 2 ) whenwe wishto specifythevalueof .f( 2)d( 2) is theprobability thata particularvalueof 2fallsbetween 2and 2+d( 2).Herearegraphsoff( 2) versus 2forthreevaluesof :Figure1 |Thechi-squaredistributionfor = 2, 4, 2rangesonlyover positive values:0< 2< 2 is equalto , andthevarianceof 2 is equalto 2.}}

4 Thedistributionis highlyskewedforsmallvaluesof , andbecomesmoresymmetricas increases,approachinga Gaussiandistributionforlarge , justas predictedby (p) is the\Gammafunction",de nedby (p+ 1) R10xpe xdx. It is a generalizationof thefactorialfunctionto non-integervaluesofp. Ifpis aninteger, (p+1) =p!. In general, (p+1) =p (p),and (1=2) =p .Chi- square : Testingforgoodnessof t4 { 3 How touse 2totestforgoodnessoffitSupposewe have a setofNexperimentallymeasuredquantitiesxi . We want to testwhethertheyarewell-describedby somesetof hypothesizedvalues i. We forma sumlike thatshownin It willcontainNterms,constitutinga samplevaluefor 2.}

5 Informingthesum,we mustuseestimatesforthe imagine,fora moment, thatwe couldrepeatourexperiment many ,we wouldobtaina datasample,andeach time,a samplevaluefor 2. If ourdatawerewell-describedby ourhypothesis,we wouldexpectoursamplevaluesof 2to bedistributedaccordingto , andillustratedby examplein However,we mustbe a oursamplesof 2willnotbe oneofNdegreesof freedom,eventhoughthereareNtermsin thesum,becauseoursamplevariablesxiwillin variablynotconstitutea setofNindependent ,typically, be at leastone,andoftenas many as threeor four, relationsareneededin orderto make estimatesof hypothesizedparameterssuchas the i, andtheirpresencewillreducethenumber of degreesof ,orconstraints, thenumber of degreesof freedombecomes =N r, andtheresulting 2samplewillbe onehaving (ratherthanN)

6 Degreesof repeatourexperiment andcollectvaluesof 2, we expect,if ourmodelis avalidone,thattheywillbe clusteredaboutthemedianvalueof 2 , withabouthalfofthesecollectedvaluesbeing greaterthanthemedianvalue, ,which we denoteby 2 ;0:5, is determinedby1Z 2 ;0:5f( 2)d 2= 0:5 Notethatbecauseof theskewednatureof thedistributionfunction,themedianvalueof 2 willbe somewhatlessthanthemean(oraverage)valueo f 2 , which as we havenoted,is equalto . For example,for = 10degreesof freedom, 210;0:5 9:34,a , we expectthata singlemeasuredvalueof 2willhave a probabilityof beinggreaterthan 2 ;0 thepreviouschapter,we showedhow a hypothesizedfunctionmay be t to a setof notedthatit may be eitherimpossibleor inconvenient to make independent estimatesof the i, in which caseestimatesof the icanbe madeonlybyassuminganideal t of thefunctionto , we assumed 2to be equalto itsmeanvalue,andfromthat,estimateduncert ainties,orcon denceintervals,forthevaluesof a procedureprecludestheuseofthe { 4 Chi- square : Testingforgoodnessof tWe cangeneralizefromtheabove discussion,to say thatwe expecta singlemeasuredvalueof 2willhave a probability (\alpha")of beinggreaterthan 2 ; , where 2 ; isde nedby1Z 2.}

7 F( 2)d 2= Thisde nitionis illustratedby theinsetin onpage4 { how the 2testworks:(a)We hypothesizethatourdataareappropriatelyde scribedby ourchosenfunction,orsetof i. Thisis thehypothesiswe aregoingto test.(b)Fromourdatasamplewe calculatea samplevalueof 2(chi- square ),alongwith (thenumber of degreesof freedom),andso determine 2= (thenormalizedchi- square ,or thechi-squareper degreeof freedom)forourdatasample.(c)We choosea valueof thesigni cancelevel (a commonvalueis .05,or 5 per cent),andfromanappropriatetableor graph( , ),determinethecorrespondingvalueof 2 ; = . We thencomparethiswithoursamplevalueof 2= .(d)If we ndthat 2= > 2 ; = , we may concludethateither(i)themodelrepresented by the iis a validonebutthata statisticallyimprobableexcursionof 2hasoccurred,or (ii)thatourmodelis so poorlychosenthatanunacceptablylargevalue of 2hasresulted.}

8 (i)willhappenwitha probability , so if we aresatis edthat(i)and(ii)aretheonlypossibilities, (ii)willhappenwitha probability 1 . Thus ifwe ndthat 2= > 2 ; = , we are100 (1 ) per cent con dent thereis a possibility (iii),forexampleif thechi-squaretestreliesontheassumptionth atchi-squareis thesumof thesquaresofrandomnormaldeviates,thatis, thateachxiis normallydistributedaboutitsmeanvalue i. However forsomeexperiments,theremay be occasionalnon-normaldatapoints thataretoo farfromthemeanto be truck passingby, or a glitch in theelectricalpower couldbe points,sometimescalledoutliers, canunexpectedlyincreasethesamplevalueof is appropriateto discarddatapoints thatareclearlyoutliers.

9 (e)If we ndthat 2is too small,thatis, if 2= < 2 ;1 = , we may concludeonlythateither(i)ourmodelis validbutthata statisticallyimprobableexcursionof 2hasoccurred,or (ii)we have, too conservatively, over-estimatedthevaluesof i, or(iii)someonehasgivenus fraudulent data,thatis, data\too good to be true".Atoo-smallvalueof 2cannotbe indicative of a poor poor modelcanonlyincrease : Testingforgoodnessof t4 { 5 Generallyspeaking,we shouldbe pleasedto nda samplevalueof 2= thatisnear1, itsmeanvaluefora good the nalanalysis,we mustbe guidedby ,beingof a statisticalnature,serves onlyas anindicator,andcannotbe eldof particlephysicsprovidesnumeroussituation swherethe 2testcanbe particularlysimpleexample5involves measurements of themassMZof theZ0bosonbyexperimentalgroupsat measurements ofMZmadeby fourdi erentdetectors(L3,OPAL,AlephandDelphi)ar eas follows:DetectorMassin GeV/c2L391:161 0:013 OPAL91:174 0:011 Aleph91:186 0.}

10 013 Delphi91:188 0:013 Thelisteduncertaintiesareestimatesof the i, thestandarddeviationsforeach of gurebelow showsthesemeasurements plottedona horizontalmassscale(verticallydisplacedf orclarity).Measurements of :Canthesedatabe welldescribedby a singlenumber,namelyanestimateofMZmadeby determiningtheweightedmeanof thefourmeasurements?5 Thisexampleis providedby Pat { 6 Chi- square : Testingforgoodnessof tWe ndtheweightedmeanMZ, anditsstandarddeviation MZlike this:6MZ=PMi= 2iP1= 2iand 2MZ=1P1= 2ito ndMZ MZ= 91:177 0:006 Thenwe form 2: 2=4Xi=1(Mi MZ)2 2i 2:78We expectthisvalueof 2to be drawnfroma chi-squaredistributionwith3degreesof is 3 (not4) becausewe have usedthemeanof thefourmeasurements to estimatethevalueof , thetruemassof theZ0boson,andthisusesuponedegreeof 2= = 2.}


Related search queries