Anchors: High Precision Model-Agnostic Explanations

Anchors: High- Precision Model-Agnostic ExplanationsMarco Tulio RibeiroUniversity of SinghUniversity of California, GuestrinUniversity of introduce a novel Model-Agnostic system that explains thebehavior of complex models with high- Precision rules calledanchors, representing local, sufficient conditions for predic-tions. We propose an algorithm to efficiently compute theseexplanations for any black-box model with high-probabilityguarantees. We demonstrate the flexibility of anchors by ex-plaining a myriad of different models for different domainsand tasks.

In a user study, we show that anchors enable usersto predict how a model would behave on unseen instanceswith less effort and higher Precision , as compared to existinglinear Explanations or no machine learning models such as deep neuralnetworks have been shown to be highly accurate for manyapplications, even though their complexity virtually makesthem black-boxes. As a consequence of the need for usersto understand the behavior of these models,interpretablemachine learninghas seen a resurgence in recent years,ranging from the design of novelglobally-interpretable ma-chine learning models (Lakkaraju, Bach, and Leskovec 2016;Ustun and Rudin 2015; Wang and Rudin 2015) to local ex-planations (for individual predictions) that can be computedfor any classifier (Baehrens et ; Ribeiro, Singh, andGuestrin 2016b; Strumbelj and Kononenko 2010).

A question at the core of interpretability is whether humansunderstand a model enough to make accurate predictionsabout its behavior on unseen instances. For instances wherehumans can confidently predict the behavior of a model, let(human)precisionbe the fraction in which they are correct(note that this is human Precision , not model Precision ). Highhuman Precision is paramount for real interpretability - onecan hardly say they understand a model if they consistentlythink they know what it will do, but are often local approaches provide Explanations that describethe local behavior of the model using a linearly weighted com-bination of the input features (Baehrens et ; Ribeiro,Singh, and Guestrin 2016b; Strumbelj and Kononenko 2010).

Linear functions can capture relative importance of featuresin an easy-to-understand manner. However, since these linearCopyrightc 2018, Association for the Advancement of ArtificialIntelligence ( ). All rights reserved.(a) Instances(b) LIME Explanations (c) anchor explanationsFigure 1: Sentiment predictions, LSTM Explanations are in some way local, it is not clear whethertheyapplyto an unseen instance. In other words, their cov-erage (region where explanation applies) is unclear. Unclearcoverage can lead to low human Precision , as users may thinkan insight from an explanation applies to unseen instanceseven when it does not.

When combined with the arithmetic in-volved in computing the contribution of the features in linearexplanations, the human effort required can be quite for example LIME (Ribeiro, Singh, and Guestrin2016b) Explanations for two sentiment predictions made byan LSTM in Figure 1. Although both Explanations are com-puted to belocally accurate, if one took the explanation onthe left and tried to apply it to the sentence on the right, onemight be tempted to think that the word not would have apositive influence, which it does not.

While such explanationsprovide insight into the model, their coverage is not clear, does not have a positive influence on sentiment?In this paper, we introduce novel Model-Agnostic expla-nations based on if-then rules, which we callanchors. Ananchorexplanation is a rule that sufficiently anchors theprediction locally such that changes to the rest of the featurevalues of the instance do not matter. In other words, for in-stances on which the anchor holds, the prediction is (almost)always the same. For example, the anchors in Figure 1c statethat the presence of the words not bad virtually guarantee aprediction of positive sentiment (and not good of negative(a)DandD(.))

|A)(b) Two toy visualizationsFigure 2: concrete example ofDin (a) and intuition (b)sentiment). Anchors are intuitive, easy to comprehend, andhave extremely clear coverage they only apply when all theconditions in the rule are met, and if they apply the precisionis high (by design).We demonstrate the usefulness of anchors by applyingthem to a variety of machine learning tasks (classification,structured prediction, text generation) on a diverse set ofdomains (tabular, text, and images). We also run a user study,where we observe that anchors enable users to predict howa model would behave on unseen instances with much lesseffort and higher Precision as compared to existing techniquesfor Model-Agnostic explanation, or no as High- Precision ExplanationsGiven a black box modelf:X Yand an instancex X,the goal of local Model-Agnostic interpretability (Ribeiro,Singh, and Guestrin 2016a; 2016b.

Strumbelj and Kononenko2010) is to explain the behavior off(x)to a user, wheref(x)is the individual prediction for instancex. The assumption isthat while the model is globally too complex to be explainedsuccinctly, zooming in on individual predictions makesthe explanation task feasible. Most Model-Agnostic methodswork by perturbing the instancexaccording to some pertur-bation distribution Dx(for simplicity,Dfrom now on). InRibeiro, Singh, and Guestrin (2016b), we emphasize that theperturbationsD(and Explanations ) must use an interpretablerepresentation ( one that makes sense to humans), even ifthe model uses an alternative representation of the a rule (set of predicates) acting on such an in-terpretable representation, such thatA(x)returns1if all itsfeature predicates are true for instancex.

For example, in Fig-ure 2a (top),x= This movie is not bad. ,f(x) =Positive,A(x) = 1whereA={ not , bad }. LetD( |A)denote theconditional distribution when the ruleAapplies ( similartexts where not and bad are present, Figure 2a bottom).Ais ananchorifA(x) = 1andAis a sufficient conditionforf(x)with high probability in our running example, ifa samplezfromD(z|A)is likely predicted asPositive( (x) =f(z)). FormallyAis an anchor if,ED(z|A)[1f(x)=f(z)] ,A(x) = 1.(1)Figure 2b shows two zoomed in regions of a complexmodel, with particular instances (+ and -) being (Ribeiro, Singh, and Guestrin 2016b) explanationswork by learning the lines that best approximate the modelunderD, with some local weighting.

The resulting expla-InstanceIfPredictI want to play(V) word isPARTICLE play is went to a play(N) word isDETERMINER play is play(V) ball word isPRONOUN play is 1: Anchors for Part-of-Speech tag for the word play nations give no indication of how faithful they are (the ex-planation on the right is a much better local approximationof the black box model than the one on the left), or whattheir local region is. In contrast, even though they use thesameD, anchors are by construction faithful, adapting theircoverage to the model s behavior (the anchor on the right ofFigure 2b is broader) and making their boundaries the discussion of how to compute anchors forlater, we now demonstrate their usefulness and flexibility viaconcrete examples in a variety of domains and Classification.

Anchors: High Precision Model-Agnostic Explanations

Tags:

Information

Transcription of Anchors: High Precision Model-Agnostic Explanations

Related search queries

Anchors: High Precision Model-Agnostic Explanations

Tags:

Information

Documents from same domain

Related documents

Related search queries