Research

New Captum version features more ways to build AI responsibly

September 21, 2021

We are excited to release a new version of Captum (“comprehension” in Latin), a powerful, easy-to-use model interpretability library for PyTorch. Captum 0.4 adds a variety of new functionality for model understanding.

Concept-based interpretability tools such as Captum make it easier for AI researchers and engineers to design, develop, and debug advanced AI models. They also help people understand how their AI models work, so they can assess whether those models reflect their values and whether they deliver accurate predictions that serve their businesses’ or organizations’ needs.

Captum also offers robustness tools to help model developers uncover vulnerabilities using robust metrics and adversarial attacks. With version 0.4, we have added tooling for evaluating model robustness, new attribution methods, and improvements to existing attribution methods.

Concept-based interpretability helps remove statistical biases

Captum can help AI researchers understand how a particular CV model interprets complex images like this.

Deep learning models can be difficult to understand. For example, an image classifier that runs on photos operates by using low-level features, such as pixel values, lines, dots, and other minor details of the image. Concept activation vectors (CAVs) are a technique to explain a neural network’s internal state by associating model predictions with concepts (such as “apron,” “cafe”, etc.) that people can easily understand.

Captum 0.4 adds testing with concept activation vectors (TCAV), allowing researchers and engineers to assess how different user-defined concepts affect a model’s prediction. TCAV also can be used for fairness analysis to check for algorithmic and label bias. Researchers have found that some networks can inadvertently embed biases that can be difficult to detect.

TCAV expands beyond currently available attribution methods, which enable researchers and engineers to quantify the importance of various inputs by also allowing them to also quantify the impact of concepts such as gender or race on a model’s prediction. In Captum 0.4, TCAV has been implemented in a generic manner, allowing users to define custom concepts with example inputs for different modalities, including vision and text. In one of our experiments, for example, we estimated the importance of using “positive adjectives” for the prediction of positive sentiment.

In the graphs below, we visualized the distributions of TCAV scores for a sensitivity analysis model introduced in one of Captum’s tutorials. As a data set, we used a list of movie ratings with positive sentiment. The graphs visualize TCAV scores for positive adjectives concepts along with five different sets of neutral terms concepts. The positive adjectives concept is significantly more important for both convolutional layers across all five different neutral concept sets. This indicates the importance of positive adjectives in predicting positive sentiment. (More details about this case study and computer vision experiments can be found in our tutorials.)

The distribution of TCAV scores for positive adjectives vs. neutral terms concepts for two different convolutional layers of our sentiment-analysis model.

Building more robust AI models

Deep learning techniques can be vulnerable to a variety of adversarial inputs that may fool an AI model but be imperceptible to humans. Captum 0.4 includes robustness tooling in order to support improved understanding of limitations and vulnerabilities of a model. A robust AI system should consistently reproduce safe and reliable results under predefined conditions. The AI system will react to unforeseen issues and make necessary changes to avoid harming or otherwise negatively affecting people.

The library also includes new tools to understand model robustness, including implementations of adversarial attacks (fast-gradient sign method and projected-gradient descent) and robustness metrics to evaluate the impact of different attacks or perturbations on a model.

Robustness metrics in this release include:

Attack Comparator, which allows users to quantify the impact of any input perturbation (such as torchvision transforms, text augmentation, etc.) or adversarial attack on a model and compare the impact of different attacks.
Minimal Perturbation, which identifies the minimum perturbation needed to cause a model to misclassify the perturbed input.

This robustness tooling enables model developers to better understand potential model vulnerabilities and analyze counterfactual examples to better comprehend a model’s decision boundary.

Layer-wise relevance propagation and attribution improvements

In collaboration with Technische Universität Berlin, we have implemented a new attribution algorithm, layer-wise relevance propagation (LRP), which offers a new perspective for explaining model predictions.

Captum 0.4 also adds both LRP and also a layer-attribution variant, layer LRP. Layer-wise relevance propagation is based on a backward propagation mechanism applied sequentially to all layers of the model. The model output score represents the initial relevance, which is decomposed into values for each neuron of the underlying layers.

Finally, Captum 0.4 has added multiple new tutorials, a variety of improvements, and bug fixes to existing attribution methods. More information regarding these improvements can be found in the official release notes. Captum is interoperable with the Fiddler platform for explainable AI, which enables engineers and developers to gather actionable insights and to analyze the decision-making behavior behind AI models.

Helping the AI community build models that are more reliable, more predictable, and better able to resist adversarial attacks is an important long-term project. We look forward to sharing more updates on our work.