Facing Imbalanced Data - University of Pittsburgh

Facing Imbalanced DataRecommendations for the Use of Performance MetricsL aszl o A. Jeni1, Jeffrey F. Cohn1, 2, and Fernando De La Torre11 Carnegie Mellon University , Pittsburgh , of Pittsburgh , Pittsburgh , Recognizing facial action units (AUs) is importantfor situation analysis and automated video annotation. Previouswork has emphasized face tracking and registration and thechoice of features classifiers. Relatively neglected is theeffect ofimbalanced data for action unit detection. While the machinelearning community has become aware of the problem ofskewed data for training classifiers, little attention has beenpaid to how skew may bias performance metrics.

To addressthis question, we conducted experiments using both simulatedclassifiers and three major databases that differ in size, typeof FACS coding, and degree of skew. We evaluated influence ofskew on both threshold metrics (Accuracy, F-score, Cohen skappa, and Krippendorf s alpha) and rank metrics (areaunder the receiver operating characteristic (ROC) curve andprecision-recall curve). With exception of area under the ROCcurve, all were attenuated by skewed distributions, in manycases, dramatically so. While ROC was unaffected by skew,precision-recall curves suggest that ROC may mask poorperformance.

Our findings suggest that skew is a critical factorin evaluating performance metrics. To avoid or minimize skew-biased estimates of performance, we recommend reportingskew-normalized scores along with the obtained INTRODUCTIONOur everyday communication is highly influenced by theemotional information available to us from other facial expression is important for situationanalysis and automated video the last decade many approaches have been proposedfor automatic facial expression recognition [7], [29]. Al-though, previous work has emphasized face tracking andregistration and the choice of feature classifiers, relativelyneglected is the effect of Imbalanced data when evaluatingaction unit the case of facial expression data , the samples can beannotated using either emotion-specified labels ( , happyor sad) or action units, as defined by the Facial ActionCoding System (FACS) [10].

Action units are anatomicallydefined facial actions that singly or in combinations candescribe nearly all possible facial expressions or unit (AU) detection, as well as expression detectionof which AU detection is a subset, is a typical binaryclassification problem where the vast majority of examplesare from one class, but the practitioner is typically interestedin the minority (positive) problem of learning from Imbalanced data sets istwofold. First of all, from the perspective of classifiertraining, imbalance in training data distribution often causeslearning algorithms to perform poorly on the minority issue has been well addressed in the machine learningliterature [4], [15], [27], [26], [8] A common solution isto sample the data prior to training to re-balance the classdistribution [2], [27].

An alternative to sampling is to usecost-sensitive learning. This approach targets the problemof skew by applying different cost matrices that describethe costs for misclassifying any particular data point [26],[8]. For a more detailed survey on the problem see [16] andthe references little attention has been paid to how skewmay spoil performance metrics. Facial expression data istypically highly skewed. Imbalance in the test data distri-bution might produce misleading conclusions with certainmetrics. Percentage agreement, referred to as accuracy, isespecially vulnerable to bias from skew. When base rate islow, high accuracy can result even when alternative methodsrarely if ever agree [12], [14].

Agreement in that case isabout the very large number of negative cases rather thanthe very few positive ones. Alternative metrics have beenproposed to address this issue [24], [15]. Ferri et al. studiedthe relationship between different performance metrics andaddress the problem of rank correlations between them [12].How does skewed data influence performance metrics foraction unit detection? To address this question, we conductedexperiments using both simulated classifiers and three majordatabases that include both posed and spontaneous facialexpression and differ in database size, type of FACS coding[9], [10], and degree of skew.

The databases were Cohn-Kanade [21], RU-FACS [13], and UNBC-McMaster PainArchive [22].We included a broad range of metrics that included boththreshold metrics (Accuracy,F1-score, Cohen s kappa, andKrippendorf s alpha) and rank metrics (area under the ROCcurve [11] and precision-recall curve). With exception ofarea under the ROC curve, all were attenuated by skeweddistributions; in many cases, dramatically so. Alpha andkappa were affected by skew in either direction; whereasF1-score was affected by skew only in one direction. WhileROC was unaffected by skew, precision-recall curves can re-veal differences between classifiers, because of the differentvisual representation of the curves.

Very different precision-recall can be associated with same findings suggest that skew is a critical factor inevaluating performance metrics. Metrics of classifier per-formance may reveal more about skew than they do aboutactual performance. Databases that are otherwise identicalwith respect to intensity of action units, head pose, and soon may give rise to very different metric values dependingonly on differences in skew. This finding has implicationsfor testing classifiers so as to avoid or minimize confoundsand for meta-analyses of classifier performance. Sensitivityof the threshold metrics for skewed distributions could bereduced by balancing the distribution of paper is built as follows.

Datasets and their prop-erties are reviewed in Section 2. Theoretical componentsare described in Section 3. Experimental results on theeffect of Imbalanced data on performance metrics and AUclassification are detailed in Section 4. Discussion and asummary conclude the paper (Section 5).II. DATASETSF irst, we describe the datasets (Section ). We thenreport findings with respect to skew for each AU ( ).In our simulations we used three major databases thatinclude both posed and spontaneous facial expression anddiffer in database size, type of FACS coding, and degreeof skew. The databases were Cohn-Kanade, RU-FACS andUNBC-McMaster Pain Cohn-Kanade ExtendedThe Cohn-Kanade Extended Facial Expression (CK+)Database [21] is an extension of the original Cohn-KanadeDatabase [18].

Cohn-Kanade has been widely used to com-pare the performance of different methods of automatedfacial expression analysis. CK+ includes 593 frontal imagesequences of directed facial action tasks ( , posed AUand AU combinations) performed by 123 different partic-ipants. Facial landmarks (68-point mesh) were tracked us-ing person-specific active appearance models [28]. Twenty-seven action units were manually coded for presence orabsence by certified FACS coders. For a subset of 118sequences, the seven universal emotion expressions (anger,contempt, disgust, fear, happy, sad and surprise) plus neutralwere labeled.

Facing Imbalanced Data - University of Pittsburgh

Tags:

Information

Transcription of Facing Imbalanced Data - University of Pittsburgh

Related search queries

Facing Imbalanced Data - University of Pittsburgh

Tags:

Information

Documents from same domain

Related documents

Related search queries