Methods of Interpretability for Deep Learning Models

Machine learning algorithms, especially deep learning, have proved to provide great performances on various tasks like classification, object detection, segmentation, etc. but making decision based on these results comes with responsibilities and need to be supported with the evidences.

Here I try to organize most important ideas in the field of interpretable machine learning:

Decomposition-based Methods

On the interpretation of weight vectors of linear models in multivariate neuroimaging. [paper]
Explaining NonLinear Classification Decisions with Deep Taylor Decomposition (2015). [paper]
On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation (2015). [paper]
Towards Explaining Anomalies: A Deep Taylor Decomposition of One-Class Models (2018). [paper]

Gradient-based Methods

Class Activation Map (CAM) : Learning Deep Features for Discriminative Localization. [paper]
Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. [paper]
Grad-CAM++: Building on Grad-CAM, it provides better visual explanations of CNN model predictions, in terms of better object localization as well as explaining occurrences of multiple object instances in a single image. [paper]
SmoothGrad: removing noise by adding noise [paper] [code] [more descriptions]
Integrated Gradients: Axiomatic Attribution for Deep Networks. It uses to two fundamental axioms Sensitivity and Implementation Invariance to guide the design of a new attribution method. [paper] [code]
Vanilla Gradients (paper, paper)

Representation Visualization and Quantification

Network Dissection: It quantifies interpretability of individual units in a deep CNN. It works by measuring the alignment between unit response and a set of concepts drawn from a broad and dense segmentation data set called Broden. [original paper, extended paper, presentation]
Visualizing and Understanding Convolutional Networks (2014). [paper]
Striving for Simplicity: The All Convolutional Net (2014). [paper]
Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps (2013). [paper]

Others

Examples are not Enough, Learn to Criticize! Criticism for Interpretability. [paper]
Explanation in Artificial Intelligence: Insights from the Social Sciences. [paper]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

interpretability_methods.md

interpretability_methods.md

Methods of Interpretability for Deep Learning Models

Decomposition-based Methods

Gradient-based Methods

Representation Visualization and Quantification

Others

Files

interpretability_methods.md

Latest commit

History

interpretability_methods.md

File metadata and controls

Methods of Interpretability for Deep Learning Models

Decomposition-based Methods

Gradient-based Methods

Representation Visualization and Quantification

Others