Transcription of Statistics and Data Analysis in Proficiency Testing
1 Statistics and data AnalysisinProficiency TestingMichael ThompsonSchool of Biological and Chemical SciencesBirkbeck College (University of London)Malet StreetLondon WC1E of a Proficiency test Harmonised Protocol .Pure Appl ,78, do we use Statistics inproficiency Testing ? Finding a consensus and its uncertainty touse as an assigned value Assessing participants results Assessing the efficacy of the PT scheme Testing for sufficient homogeneity andstability of the distributed test material OthersCriteria for an ideal scoringmethod Adds value to raw results. Easily understandable, based on theproperties of the normal distribution. Has no arbitrary scaling transformation.
2 Is transferable between differentconcentrations, analytes, matrices, andmeasurement can we construct a score? An obvious idea is to utilise the propertiesof the normal distribution to interpret theresults of a Proficiency do not makeany assumptionsabout the dataset A Determination of protein nitrogen in a weak scoring method On average, slightly more than 95% of laboratoriesreceive z-score within the range 2. sxxz sxRobust mean and standarddeviation Robust Statistics is applicable to datasets thatlook like normally distributed samplescontaminated with outliers and stragglers ( ,unimodal and roughly symmetric. The method downweights the otherwise largeinfluence of outliers and stragglers on theestimates.)
3 It models the central reliable part of the , Can I use robust estimates?Measurement axisSkewedBimodalHeavy-tailed nxxx 21 ,median ,0,21 Set00 pk ppippppippppippiikxkkxkkxkxx if if if~)~var()( )~(mean 211ipipxkfx 1converged,notIf ppHuber s H15 References: robust Statistics Analytical Methods Committee,Analyst,1989,114, 1489 AMC Technical Brief No 6, 2001(download from ) P J Rousseeuw,J. Chemomet, 1991,5, that enough? On average, slightly less than 95% oflaboratories receive a z-score between 2. robrobxz robrob What more do we need? We need a method thatevaluatesthe datain relation to its intended use, rather thanmerely describing it. This adds value to the data rather thansimply summarising it.
4 The method is based onfitness for purpose Fitness for purpose occurs when the uncertaintyof the resultufgives best value for money. If the uncertainty is smaller thanuf, the analysismay be too expensive. If the uncertainty is larger thanuf, the cost andthe probability of a mistaken decision will for purpose The value ofufcan sometimes be estimatedobjectively by decision theoretic methods, but ismost often simply agreed between thelaboratory and the customer by professionaljudgement. In the Proficiency test context,ufshould bedetermined by the scheme : T Fearn, S A Fisher, M Thompson,and S L R Ellison,Analyst, 2002,127, 818-824. If we now define a z-score thus:we have a z-score that is both robustified againstextreme valuesandtells us something about fitnessfor purpose.
5 In an exactly compliant laboratory, scores of 2<|z|<3will be encountered occasionally, and scores of |z|>3rarely. Better performers will receive fewer of theseextreme score that meets all of thecriteria fpprobuxz where Example data A again Suppose that the fitness for purpose criterion setfor the Analysis is an RSD of 1%. This gives p Finding a consensus fromparticipants results The consensus is not theoretically the bestoption for the assigned value but is usuallythe only practicable value. The consensus is not necessarily identicalwith the true value. PT providers have tobe alert to this is a consensus ? Mean?- easy to calculate, but affected byoutliers and asymmetry. Robust mean?
6 - fairly easy to calculate, handlesoutliers but affected by asymmetry. Median?- easy to calculate, more robust forasymmetric distributions, but larger standarderror than robust mean. Mode?- intuitively good, difficult to define,difficult to calculate. The robust mean provides a useful consensusin the great majority of instances, where theunderlying distribution is roughly symmetricand there are 0-10% outliers. The uncertainty of this consensus can besafely taken asThe robust mean as consensus nxuroba When can I use robust estimates?Measurement axisSkewedBimodalHeavy-tailedSkewed distributions Skews can arise when the participants results come from two or moreinconsistent methods.
7 They can also arise as an artefact at lowconcentrations of analyte as a result ofdata recording practice. Rarely, skews can arise when thedistribution is truly use of a trimmed dataset?Can I use the mode?How many modes? Where are they?The normalkerneldensityforidentifyinga modewhere is thestandardnormaldensity,AMCT echnicalBriefNo. 4 niihxxnhy11 2)2/exp()(2aa A normalkernelA kerneldensityAnotherkerneldensityGraphic alrepresentationof sampledataKerneldensityoftheaflatoxin dataUncertainty of the mode The uncertainty of the consensus can beestimated as the standard error of themode by applying the bootstrap to theprocedure. The bootstrap is a general procedurebased on resampling for estimatingstandard errors of complex Statistics .
8 Reference:Bump-hunting for the proficiencytester searching for JLowthian and M Thompson,Analyst, 2002,127, normal mixture modelAMCT echnicalBriefNo 23,andAMC , AccQualAssur,2006,10, ,)()(11 mjjmjjjpyfpyf 22/)(exp()(22jjyyf Mixture models found by the maximumlikelihood method (the EM algorithm) The M-step The E-stepnyjPpniij/)( 1 niiniiijyjPyjPy11)( )( )( )( ) ( 1122injmiijiyjPyjPy )( )( )( 1imjjjijjiyfpyfpyjP Kerneldensityandfit of 2-componentnormal mixturemodelKerneldensityandvariance-inf latedmixturemodelUseful References Mixture modelsM 2006,10, Technical Brief No. 23, Kernel densitiesB W Silverman,Density estimation for Statistics and dataanalysis. Chapman and Hall, London, Technical Brief, no.)
9 4, 2001 The bootstrapB Efron and R J Tibshirani,An introduction to and Hall, London, 1993 AMC Technical Brief, No. 8, 2001 Use z-scores based on fitness forpurpose. Estimate the consensus as the robustmean and its uncertainty asif the dataset is roughly symmetric. If the dataset is skewed and plausiblycomposite, use kernel density methodsor mixture modelsConclusions scoringnrob Homogeneity Testing Comminute and mix bulk material. Split into distribution units. Selectm>10 distribution units at random. Homogenise each one. Analyse 2 test portions from each inrandom order, with high precision, andconduct one-way Analysis of variance ,MSWMSBsMSWssaman Problems with simple ANOVA based on Testing Analytical precision too low methodcannot detect consequential degree ofheterogeneity.
10 Analytical precision too high methodfinds significant degree of heterogeneitythat may not be consequential.(Everything is heterogeneous!)0:0 samH Material passes homogeneity test if Problems are: ssammay not be well estimated; too big a probability of rejectingsatisfactory test material. Sufficient homogeneity :original definitionpLsams Fearn test Testby rejecting whenRef:Analyst, 2001,127, :LsamH 211,122122 mmanmLsamFsms Problems with homogeneitydata Problems with data are , no proper randomisation, insufficientprecision, biases, trends, steps,insufficient significant figures recorded,outliers. Laboratories need detailed instructions. data need careful scrutiny beforestatistics.