Example: barber

Detecting Outliers - Brendan Gregg's Homepage

Rg/blo gs / Brendan /2013/07/01/ Detecting - o utliers / Detecting OutliersIn co mput er perf o rmance, we re es pecially co ncerned abo ut latencyoutliers: very s lo w dat abas e queries , applicat io n reques t s , dis k I/O,et c. T he t erm o ut lier is s ubject ive: t here is no rigid mat hemat icaldef init io n. Fro m [Grubbs 69]:An outlying observation, or outlier, is one that appears todeviate markedly from other members of the sample in which liers are co mmo nly det ect ed by co mparing t he maximum value in adat a s et t o a cus t o m t hres ho ld, s uch as 50 o r 100 ms f o r dis k his requires t he met ric t o be well unders t o o d bef o rehand, as isus ually t he cas e f o r applicat io n lat ency and o t her key met rics .Ho wever, we are als o o f t en f aced wit h a large number o f unf amiliarmet rics , where we do n t kno w t he t hres ho lds in here are a number o f pro po s ed t es t s f o r o ut liers which do n t relyo n t hres ho lds.

Click the image to see 6σ and the mean, standard deviation, and 99th percentile for comparison. Visualizing Sigma Here are the earlier distributions with their max sigma values on the right:

Tags:

  Detecting, Outliers, Detecting outliers

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Detecting Outliers - Brendan Gregg's Homepage

1 Rg/blo gs / Brendan /2013/07/01/ Detecting - o utliers / Detecting OutliersIn co mput er perf o rmance, we re es pecially co ncerned abo ut latencyoutliers: very s lo w dat abas e queries , applicat io n reques t s , dis k I/O,et c. T he t erm o ut lier is s ubject ive: t here is no rigid mat hemat icaldef init io n. Fro m [Grubbs 69]:An outlying observation, or outlier, is one that appears todeviate markedly from other members of the sample in which liers are co mmo nly det ect ed by co mparing t he maximum value in adat a s et t o a cus t o m t hres ho ld, s uch as 50 o r 100 ms f o r dis k his requires t he met ric t o be well unders t o o d bef o rehand, as isus ually t he cas e f o r applicat io n lat ency and o t her key met rics .Ho wever, we are als o o f t en f aced wit h a large number o f unf amiliarmet rics , where we do n t kno w t he t hres ho lds in here are a number o f pro po s ed t es t s f o r o ut liers which do n t relyo n t hres ho lds.

2 If s uch a t es t wo rks , o ut liers can be det ect ed f ro m any perf o rmance met ll explain o ut liers us ing a vis ualiz at io n, and pro po s e a s imple t es t f o r t heir det ect io n. I ll t hen us e it o ns ynt het ic and t hen real wo rld dis t ribut io ns . T he res ult s are s urpris his is dis k I/O lat ency f ro m a pro duct io n clo ud s erver as a f requency t rail, s ho wing 10,000 I/O lat encymeas urement s f ro m t he blo ck device int erf ace level: Out liers can be s een as dis t ant po int s o n t he right .ProblemNo w co ns ider t he f o llo wing 25 s ynt het ic rando m dis t ribut io ns , which are s ho wn as f illed f requency t rails . T hes ehave been co lo red dif f erent s hades o f yello w t o help dif f erent iat e o verlaps.

3 T he purpo s e is t o co mpare t hedis t ance f ro m t he bulk o f t he dat a t o t he o ut liers , which lo o k like grains o f s and. Many o f t hes e appear t o have o ut liers : values t hat deviat e markedly f ro m o t her members o f t he s ample. Whicho nes ?Six Sigma TestT his ident if ies t he pres ence o f o ut liers bas ed o n t heir dis t ance f ro m t he bulk o f t he dat a, and s ho uld berelat ively eas y t o unders t and and implement . Firs t , calculat e t he max s igma:max = (max(x) ) / T his is ho w f ar t he max is abo ve t he mean, , in unit s o f s t andard deviat io n, (s igma).T he s ix s igma t es t is t hen: Outliers = (max >= 6)If any meas urement exceeds s ix s t andard deviat io ns , we can s ay t hat t he s ample co nt ains ing t he earlier dis k I/O dat a s et : Click t he image t o s ee 6 and t he mean, s t andard deviat io n, and 99t h percent ile f o r co mparis o SigmaHere are t he earlier dis t ribut io ns wit h t heir max s igma values o n t he right :Yo u can us e t his t o unders t and ho w max s igma s cales , and what 6 will and wo n t ident if y.

4 T here is als o avers io n wit h 100 dis t ribut io ns, and no n- co lo red whit e and black vers io ns .Here is ano t her s et, which has dif f erent dis t ribut io n t ypes and numbers o f mo des .T he s ix s igma t es t appears t o wo rk well f o r t hes e s ynt het ic dis t ribut io ns . If yo u wis h t o us e a dif f erent s igmavalue, yo u can us e t hes e plo t s t o help guide yo ur cho I/O Latency OutliersNo w f o r real dat a. T he f o llo wing are 35 dis k I/O lat ency dis t ribut io ns , each wit h 50,000 I/O, s o rt ed o n maxs igma, and wit h t he x- axis s caled f o r each f requency t rail:One charact eris t ic t hat may s t and o ut is t hat many o f t hes e dis t ribut io ns aren t no rmal: t hey are co mbinat io nso f bimo dal and lo g- no rmal.

5 T his is expect ed: t he lo wer lat ency mo de is f o r dis k cache hit s , and t he higherlat ency mo de is f o r dis k cache mis s es , which als o has queueing creat ing a t ail. T he pres ence o f t wo mo desand a t ail increas es t he s t andard deviat io n, and t hus , lo wers max s o f t hes e dis t ribut io ns s t ill have o ut liers acco rding t o t he s ix s igma t es t . And t his is jus t t he t o p 35: s ee t hef ull 200 dis k I/O dis t ribut io ns (whit e, black), f ro m 200 rando m pro duct io n s ervers .100% of these servers have latency outliersI ve t ackled many dis k I/O lat ency o ut lier is s ues in t he pas t , but haven t had a go o d s ens e f o r ho w co mmo no ut liers really are. Fo r my dat acent er, dis ks , enviro nment , and during a 50,000 I/O s pan, t his vis ualiz at io ns ho ws t hat lat ency o ut liers are very co mmo n Latency OutliersHere are 35 MySQL co mmand lat ency dis t ribut io ns , f ro m abo ut 5,000 meas urement s each:T his is f ro m 100 rando m MySQL pro duct io n s ervers, where 96% have 6 o ut liers.

6 Latency OutliersHere are 35 no HT T P s erver res po ns e t ime dis t ribut io ns , f ro m abo ut 5,000 meas urement s each:T his is f ro m 100 rando m no pro duct io n s ervers, where 98% have 6 o ut liers .The Implications of OutliersT he pres ence o f o ut liers in a dat as et has s o me impo rt ant implicat io ns :1. T here may be much great er values o ut liers t han t he average and s t andard deviat io n s ugges t . Us e away t o examine t hem, s uch as a vis ualiz at io n o r lis t ing t hem beyo nd a t hres ho ld (eg, 6 ). At t he veryleas t , examine t he maximum Yo u can t t rus t t he average o r s t andard deviat io n t o ref lect t he bulk o f t he dat a, as t hey may be s light lyinf luenced by o ut liers.

7 Fo r t he bulk o f t he dat a, yo u can t ry us ing ro bus t s t at is t ics s uch as t he medianand t he median abs o lut e deviat io n (MAD).In a recent s evere cas e, t he mean applicat io n res po ns e t ime was o ver 3 ms . Ho wever, explaining t his valuealo ne was f ut ile. Upo n s t udying t he dis t ribut io n, I s aw t hat mo s t s reques t s were aro und 1 ms , as was t hemedian but t here were o ut liers t aking up t o 30 s eco nds !While o ut liers can be a perf o rmance pro blem, t hey aren t neces s arily s o . Here are t he s ame 200 dis k I/Odis t ribut io ns , numbered and s o rt ed bas ed o n t heir max lat ency in millis eco nds (whit e, black). Only 80% o f t hes ehave lat ency o ut liers bas ed o n a 50 ms t hres ho ld.

8 Fo r s o me dis t ribut io ns , 1 ms exceeds 6 , as t he bulk o f t heI/O were much f as t StepsAf t er ident if ying t he pres ence o f o ut liers , yo u can examine t hem vis ually us ing us ing a his t o gram, f requencyt rail, s cat t er plo t , o r heat map. Fo r all o f t hes e, a labeled axis can be included t o s ho w t he value range,indicat ing t he maximum value heir values can als o be s t udied individually, by o nly lis t ing t ho s e beyo nd 6 in t he s ample. Ext ra inf o rmat io ncan t hen be co llect ed, which wo uld have been t o o much det ail f o r t he ent ire dat a s et .What Causes Outliers ?Out liers , depending o n t heir t ype, may have many caus es . To give yo u an idea f o r lat ency o ut liers :Net wo rk o r ho s t packet dro ps , and T CP t imeo ut - bas ed ret rans mit s.

9 DNS t imeo ut s .Paging o r s ck co nt ent io io n s o f t ware s calabilit y is s ues .Erro rs and ret ries .CPU caps , and s cheduler lat io n by higher prio rit y wo rk (kernel/int errupt s ).So me guy s ho ut ing at yo ur dis d lo ve t o analyz e and s ho w what t he previo us ly s ho wn o ut liers were caus ed by, but I ll have t o s ave t hat f o rlat er po s t s (t his is lo ng eno ugh).Implementing Sigma TestsT he lat ency meas urement s us ed here were t raced us ing DTrace, and t hen po s t - pro ces s ed us ing here are a number o f ways t o implement t his in real- t ime. By us e o f cumulat ive s t at is t ics , t he mean ands t andard deviat io n can be kno wn f o r t he ent ire po pulat io n s ince co llect io n began.

10 T he max can t hen beco mpared t o t hes e cumulat ive met rics when each event (I/O o r reques t ) co mplet es , and t hen t he max s igma canbe calculat ed and maint ained in a co unt r example: t he dis k I/O s t at is t ics repo rt ed by io s t at (1), which ins t rument t he blo ck device layer, aremaint ained in t he kernel as a gro up o f s t at is t ics which are t he t o t als s ince bo o t . Fo r t he Linux kernel, t hes e aret he eleven /pro c/dis ks t at s as do cument ed in Do cument at io n/io s t at s .t xt , and maint ained in t he kernel as s t ructdis k_s t at s . A member t o s uppo rt calculat ing t he s t andard deviat io n can be added, which has t he cumulat ives quare o f t he dif f erence t o t he mean, as well as max s igma and maximum members.


Related search queries