Transcription of SAMEK ET AL. EVALUATING THE VISUALIZATION …
1 SAMEK ET AL. EVALUATING THE VISUALIZATION OF what A DEEP neural network HAS LEARNED1 EVALUATING the VISUALIZATION of what aDeep neural network has learnedWojciech SAMEK Member, IEEE,Alexander Binder , Gr egoire Montavon, Sebastian Bach, and Klaus-RobertM uller,Member, IEEE,Abstract Deep neural Networks (DNNs) have demonstratedimpressive performance in complex machine learning tasks suchas image classification or speech recognition. However, due totheir multi-layer nonlinear structure, they are not transparent, , it is hard to graspwhatmakes them arrive at a particularclassification or recognition decision given a new unseen datasample.
2 Recently, several approaches have been proposed en-abling one to understand and interpret the reasoning embodiedin a DNN for a single test image. These methods quantifythe importance of individual pixels wrt the classificationdecision and allow a VISUALIZATION in terms of a heatmap inpixel/input space. While the usefulness of heatmaps can bejudged subjectively by a human, an objective quality measureis missing. In this paper we present a general methodologybased on region perturbation for EVALUATING ordered collectionsof pixels such as heatmaps.
3 We compare heatmaps computed bythree different methods on the SUN397, ILSVRC2012 and MITP laces data sets. Our main result is that the recently proposedLayer-wise Relevance Propagation (LRP) algorithm qualitativelyand quantitatively provides a better explanation of what madea DNN arrive at a particular classification decision than thesensitivity-based approach or the deconvolution method. Weprovide theoretical arguments to explain this result and discuss itspractical implications. Finally, we investigate the use of heatmapsfor unsupervised assessment of neural network Terms Convolutional neural Networks, Explanation,Heatmapping, Relevance Models, Image INTRODUCTIONDeep neural Networks (DNNs) are powerful methods forsolving large scale real world problems such as automated im-age classification [1], [2], [3], [4], natural language processing[5], [6], human action recognition [7], [8], or physics [9].
4 SeeThis work was supported by the Brain Korea 21 Plus Program throughthe National Research Foundation of Korea funded by the Ministry ofEducation. This work was also supported by the grant DFG (MU 987/17-1) and by the German Ministry for Education and Research as Berlin BigData Center BBDC (01IS14013A). This publication only reflects the authorsviews. Funding agencies are not liable for any use that may be made of theinformation contained indicate corresponding author. W. SAMEK is with Fraunhofer Heinrich Hertz Institute, 10587 Berlin,Germany.
5 (e-mail: A. Binder is with the ISTD Pillar, Singapore University of Technologyand Design (SUTD), Singapore, and with the Berlin Institute of Technology(TU Berlin), 10587 Berlin, Germany. (e-mail: Montavon is with the Berlin Institute of Technology (TU Berlin), 10587 Berlin, Germany. (e-mail: Bach is with Fraunhofer Heinrich Hertz Institute, 10587 Berlin, Germany.(e-mail: M uller is with the Berlin Institute of Technology (TU Berlin),10587 Berlin, Germany, and also with the Department of Brain and Cog-nitive Engineering, Korea University, Seoul 136-713, Korea (e-mail: WS and AB contributed equallyalso [10].)))))
6 Since DNN training methodologies (unsupervisedpretraining, dropout, parallelization, GPUs etc.) have beenimproved [11], DNNs are recently able to harvest extremelylarge amounts of training data and can thus achieve recordperformances in many research fields. At the same time,DNNs are generally conceived as black box methods, andusers might consider this lack of transparency a drawback inpractice. Namely, it is difficult to intuitively and quantitativelyunderstand the result of DNN inference, for anindividualnovel input data point,whatmade the trained DNN modelarrive at a particular response.
7 Note that this aspect differsfrom feature selection [12], where the question is: whichfeatures are on average salient for theensembleof recently, the transparency problem has been receivingmore attention for general nonlinear estimators [13], [14], [15].Several methods have been developed to understand what aDNN has learned [16], [17], [18]. While in DNN a large bodyof work is dedicated to visualize particular neurons or neuronlayers [1], [19], [20], [21], [22], [23], [24], we focus here onmethods which visualize the impact of particular regions of agiven and fixed single image for a prediction of this and Fergus [19] have proposed in their work a networkpropagation technique to identify patterns in a given inputimage that are linked to a particular DNN prediction.
8 Thismethod runs a backward algorithm that reuses the weightsat each layer to propagate the prediction from the outputdown to the input layer, leading to the creation of meaningfulpatterns in input space. This approach was designed for aparticular type of neural network , namely convolutional netswith max-pooling and rectified linear units. A limitation of thedeconvolution method is the absence of a particular theoreticalcriterion that would directly connect the predicted outputto the produced pattern in a quantifiable way. Furthermore,the usage of image-specific information for generating thebackprojections in this method is limited to max-pooling layersalone.
9 Further previous work has focused on understandingnon-linear learning methods such as DNNs or kernel methods[14], [25], [26] essentially by sensitivity analysis in the senseof scores based on partial derivatives at the given derivatives look at local sensitivities detached fromthe decision boundary of the classifier. Simonyan et al. [26]applied partial derivatives for visualizing input sensitivities inimages classified by a deep neural network . Note that although[26] describes a Taylor series, it relies on partial derivativesat the given image for computation of results.
10 In a strictsense partial derivatives do not explain a classifier s [ ] 21 Sep 2015 SAMEK ET AL. EVALUATING THE VISUALIZATION OF what A DEEP neural network HAS LEARNED2( what speaks for the presence of a car in the image ), butrather tell uswhat change would make the image more or lessbelong to the category car. As shown later these two typesof explanations lead to very different results in practice. Anapproach, Layer-wise Relevance Propagation (LRP), which isapplicable to arbitrary types of neural unit activities (even ifthey are non-continuous) and to general DNN architectureshas been proposed by Bach et al.