An introduction to ROC analysis - Stanford University

An introduction to ROC analysisTom FawcettInstitute for the Study of Learning and Expertise, 2164 Staunton Court, Palo Alto, CA 94306, USAA vailable online 19 December 2005 AbstractReceiver operating characteristics (ROC) graphs are useful for organizing classifiers and visualizing their performance. ROC graphsare commonly used in medical decision making, and in recent years have been used increasingly in machine learning and data miningresearch. Although ROC graphs are apparently simple, there are some common misconceptions and pitfalls when using them in purpose of this article is to serve as an introduction to ROC graphs and as a guide for using them in research. 2005 Elsevier All rights :ROC analysis ; Classifier evaluation; Evaluation metrics1. IntroductionA receiver operating characteristics (ROC) graph is atechnique for visualizing, organizing and selecting classifi-ers based on their performance.

ROC graphs have longbeen used in signal detection theory to depict the tradeoffbetween hit rates and false alarm rates of classifiers (Egan,1975; Swets et al., 2000). ROC analysis has been extendedfor use in visualizing and analyzing the behavior of diag-nostic systems (Swets, 1988). The medical decision makingcommunity has an extensive literature on the use of ROCgraphs for diagnostic testing (Zou, 2002).Swets et al.(2000)brought ROC curves to the attention of the widerpublic with theirScientific of the earliest adopters of ROC graphs in machinelearning wasSpackman (1989), who demonstrated thevalue of ROC curves in evaluating and comparing algo-rithms. Recent years have seen an increase in the use ofROC graphs in the machine learning community, due inpart to the realization that simple classification accuracyis often a poor metric for measuring performance (Provostand Fawcett, 1997; Provost et al.)

, 1998). In addition tobeing a generally useful performance graphing method,they have properties that make them especially useful fordomains with skewed class distribution and unequal clas-sification error costs. These characteristics have becomeincreasingly important as research continues into the areasof cost-sensitive learning and learning in the presence ofunbalanced graphs are conceptually simple, but there are somenon-obvious complexities that arise when they are used inresearch. There are also common misconceptions and pit-falls when using them in practice. This article attempts toserve as a basic introduction to ROC graphs and as a guidefor using them in research. The goal of this article is toadvance general knowledge about ROC graphs so as topromote better evaluation practices in the Classifier performanceWe begin by considering classification problems usingonly two classes.

Formally, each instanceIis mapped toone element of the set {p,n} of positive and negative classlabels. Aclassification model(orclassifier) is a mappingfrom instances to predicted classes. Some classificationmodels produce a continuous output ( , an estimate ofan instance s class membership probability) to which differ-ent thresholds may be applied to predict class models produce a discrete class label indicating onlythe predicted class of the instance. To distinguish between0167-8655/$ - see front matter 2005 Elsevier All rights Recognition Letters 27 (2006) 861 874the actual class and the predicted class we use the labels{Y,N} for the class predictions produced by a a classifier and an instance, there are four possibleoutcomes. If the instance is positive and it is classified aspositive, it is counted as atrue positive; if it is classifiedas negative, it is counted as afalse negative.

If the instanceis negative and it is classified as negative, it is counted as atrue negative; if it is classified as positive, it is counted as afalse positive. Given a classifier and a set of instances (thetest set), a two-by-twoconfusion matrix(also called a con-tingency table) can be constructed representing the disposi-tions of the set of instances. This matrix forms the basis formany common 1shows a confusion matrix and equations of severalcommon metrics that can be calculated from it. The num-bers along the major diagonal represent the correct deci-sions made, and the numbers of this diagonal representthe errors the confusion between the various positive rate1(also calledhit rateandrecall)ofaclassifier is estimated astp rate Positives correctly classifiedTotal positivesThefalse positive rate(also calledfalse alarm rate)oftheclassifier isfp rate Negatives incorrectly classifiedTotal negativesAdditional terms associated with ROC curves aresensitivity recallspecificity True negativesFalse positives True negatives 1 fp ratepositive predictive value precision3.

ROC spaceROC graphs are two-dimensional graphs in whichtprateis plotted on theYaxis andfp rateis plotted on theXaxis. An ROC graph depicts relative tradeoffs betweenbenefits (true positives) and costs (false positives).Fig. 2shows an ROC graph with five classifiers labeled A is one that outputs only a class discrete classifier produces an (fp rate,tp rate) paircorresponding to a single point in ROC space. The classifi-ers inFig. 2are all discrete points in ROC space are important to note. Thelower left point (0,0) represents the strategy of never issu-ing a positive classification; such a classifier commits nofalse positive errors but also gains no true positives. Theopposite strategy, of unconditionally issuing positive classi-fications, is represented by the upper right point (1,1).

The point (0,1) represents perfect classification. D s per-formance is perfect as , one point in ROC space is better thananother if it is to the northwest (tp rateis higher,fp rateis lower, or both) of the first. Classifiers appearing on theleft-hand side of an ROC graph, near theXaxis, may beHypothesizedclassYNpnPNColumn totals:True classFalsePositivesTruePositivesTrueNega tivesFalseNegativesFig. 1. Confusion matrix and common performance metrics calculated from clarity, counts such as TP and FP will be denoted with upper-caseletters and rates such astp ratewill be denoted with positive rateTrue positive rateDEFig. 2. A basic ROC graph showing five discrete Fawcett / Pattern Recognition Letters 27 (2006) 861 874thought of as conservative : they make positive classifica-tions only with strong evidence so they make few false posi-tive errors, but they often have low true positive rates aswell.

Classifiers on the upper right-hand side of an ROCgraph may be thought of as liberal : they make positiveclassifications with weak evidence so they classify nearlyall positives correctly, but they often have high false posi-tive rates. InFig. 2, A is more conservative than B. Manyreal world domains are dominated by large numbers ofnegative instances, so performance in the far left-hand sideof the ROC graph becomes more Random performanceThe diagonal liney=xrepresents the strategy of ran-domly guessing a class. For example, if a classifier ran-domly guesses the positive class half the time, it can beexpected to get half the positives and half the negativescorrect; this yields the point ( , ) in ROC space. If itguesses the positive class 90% of the time, it can beexpected to get 90% of the positives correct but its falsepositive rate will increase to 90% as well, yielding( , ) in ROC space.

Thus a random classifier will pro-duce a ROC point that slides back and forth on the dia-gonal based on the frequency with which it guesses thepositive class. In order to get away from this diagonal intothe upper triangular region, the classifier must exploit someinformation in the data. InFig. 2,C s performance is virtu-ally random. At ( , ), C may be said to be guessing thepositive class 70% of the classifier that appears in the lower right triangleperforms worse than random guessing. This triangle istherefore usually empty in ROC graphs. If we negate aclassifier that is, reverse its classification decisions onevery instance its true positive classifications become falsenegative mistakes, and its false positives become true neg-atives. Therefore, any classifier that produces a point inthe lower right triangle can be negated to produce a pointin the upper left triangle.

InFig. 2, E performs much worsethan random, and is in fact the negation of B. Any classifieron the diagonal may be said to have no information aboutthe class. A classifier below the diagonal may be said tohave useful information, but it is applying the informationincorrectly (Flach and Wu, 2003).Given an ROC graph in which a classifier s performanceappears to be slightly better than random, it is natural toask: is this classifier s performance truly significant or isit only better than random by chance? There is no conclu-sive test for this, butForman (2002)has shown a method-ology that addresses this question with ROC Curves in ROC spaceMany classifiers, such as decision trees or rule sets, aredesigned to produce only a class decision, , aYorNon each instance. When such a discrete classifier is appliedto a test set, it yields a single confusion matrix, which inturn corresponds to one ROC point.

An introduction to ROC analysis - Stanford University

Tags:

Information

Transcription of An introduction to ROC analysis - Stanford University

Related search queries

An introduction to ROC analysis - Stanford University

Tags:

Information

Documents from same domain

Related documents

Related search queries