diff --git a/CHANGELOG.md b/CHANGELOG.md index d3cee218..bf0fc4a3 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,15 +1,38 @@ # Change Log -## [1.1.0 Unreleased] +## [1.2.0 Unreleased] ### Summary -* Support OpenVINO IR (.xml) / ONNX (.onnx) model file for `Explainer` model -* Enable AISE: Adaptive Input Sampling for Explanation of Black-box Models -* Upgrade OpenVINO to 2024.3.0 +* + +### What's Changed + +* + +### Known Issues + +* Runtime error from ONNX / OpenVINO IR models while conversion or inference for XAI in https://github.com/openvinotoolkit/openvino_xai/issues/29 +* Models not supported by white box XAI methods in https://github.com/openvinotoolkit/openvino_xai/issues/30 + +### New Contributors + +* N/A + +--- + +## [1.1.0] + +### Summary + +* Support PyTorch models with `insert_xai()` API for saliency map generation on PyTorch / ONNX runtime +* Support OpenVINO IR (.xml) / ONNX (.onnx) model files for `Explainer` +* Enable AISE method: Adaptive Input Sampling for Explanation of Black-box Models +* Add Pointing Game, Insertion-Deletion AUC and ADCC quality metrics for saliency maps +* Upgrade OpenVINO to 2024.4.0 * Add saliency map visualization with explanation.plot() * Enable flexible naming for saved saliency maps and include confidence scores -* Add Pointing Game, Insertion-Deletion AUC and ADCC quality metrics for saliency maps +* Add XAI method documentation ### What's Changed @@ -26,6 +49,11 @@ * Add [Insertion-Deletion AUC](https://arxiv.org/abs/1806.07421) saliency map quality metric by @GalyaZalesskaya in https://github.com/openvinotoolkit/openvino_xai/pull/56 * Add [ADCC](https://arxiv.org/abs/2104.10252) saliency map quality metric by @GalyaZalesskaya in https://github.com/openvinotoolkit/openvino_xai/pull/57 * Enable AISE for detection: Adaptive Input Sampling for Explanation of Black-box Models by @negvet in https://github.com/openvinotoolkit/openvino_xai/pull/55 +* Enable prediction attribute for methods that is used for enhancing saliency map overlay by @negvet in https://github.com/openvinotoolkit/openvino_xai/pull/62 +* Add documentation per-method, including summary and usage guide by @negvet in https://github.com/openvinotoolkit/openvino_xai/pull/63 +* Support Pytorch models for `insert_xai` API by @goodsong81 in https://github.com/openvinotoolkit/openvino_xai/pull/61 +* Auto-detect feature layer for Pytorch models by @goodsong81 in https://github.com/openvinotoolkit/openvino_xai/pull/64 +* Upgrade OpenVINO to 2024.4.0 by @goodsong81 in https://github.com/openvinotoolkit/openvino_xai/pull/72 ### Known Issues diff --git a/README.md b/README.md index 2805da9d..e242ed42 100644 --- a/README.md +++ b/README.md @@ -8,10 +8,10 @@ [Install](#installation) • [Quick Start](#quick-start) • [License](#license) • -[Documentation](https://openvinotoolkit.github.io/openvino_xai/releases/1.0.0) +[Documentation](https://openvinotoolkit.github.io/openvino_xai/releases/1.1.0) ![Python](https://img.shields.io/badge/python-3.10%2B-green) -[![OpenVINO](https://img.shields.io/badge/openvino-2024.2-purple)](https://pypi.org/project/openvino/) +[![OpenVINO](https://img.shields.io/badge/openvino-2024.4-purple)](https://pypi.org/project/openvino/) [![codecov](https://codecov.io/gh/openvinotoolkit/openvino_xai/graph/badge.svg?token=NR0Z0CWDK9)](https://codecov.io/gh/openvinotoolkit/openvino_xai) [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) [![PyPI](https://img.shields.io/pypi/v/openvino_xai)](https://pypi.org/project/openvino_xai) @@ -24,9 +24,9 @@ ![OpenVINO XAI Concept](docs/source/_static/ovxai-concept.svg) **OpenVINO™ Explainable AI (XAI) Toolkit** provides a suite of XAI algorithms for visual explanation of -[**OpenVINO™**](https://github.com/openvinotoolkit/openvino) Intermediate Representation (IR) models. +[**OpenVINO™**](https://github.com/openvinotoolkit/openvino) as well as [**PyTorch**](https://pytorch.org) and [**ONNX**](https://onnx.ai/) models. -Given **OpenVINO** models and input images, **OpenVINO XAI** generates **saliency maps** +Given AI models and input images, **OpenVINO XAI** generates **saliency maps** which highlights regions of the interest in the inputs from the models' perspective to help users understand the reason why the complex AI models output such responses. @@ -51,14 +51,15 @@ for i, image in enumerate(images): ## Features -### What's new in v1.0.0 +### What's new in v1.1.0 -* Support generation of classification and detection per-class and per-image saliency maps -* Enable White-Box ([ReciproCAM](https://arxiv.org/abs/2209.14074)) and Black-Box ([RISE](https://arxiv.org/abs/1806.07421v3)) eXplainable AI algorithms -* Support CNNs and Transformer-based architectures (validation on diverse set of [timm](https://github.com/huggingface/pytorch-image-models) models) -* Enable `Explainer` (stateful object) as the main interface for XAI algorithms -* Support `AUTO` mode by default to detect the best XAI method for given models -* Expose `insert_xai` functional API to support XAI head insertion for OpenVINO IR models +* Support PyTorch models with `insert_xai()` API for saliency map generation on PyTorch / ONNX runtime +* Support OpenVINO IR (.xml) / ONNX (.onnx) model files for `Explainer` +* Enable AISE method: Adaptive Input Sampling for Explanation of Black-box Models +* Add Pointing Game, Insertion-Deletion AUC and ADCC quality metrics for saliency maps +* Upgrade OpenVINO to 2024.4.0 +* Add saliency map visualization with explanation.plot() +* Enable flexible naming for saved saliency maps and include confidence scores Please refer to the [change logs](CHANGELOG.md) for the full release history. @@ -67,15 +68,17 @@ Please refer to the [change logs](CHANGELOG.md) for the full release history. At the moment, *Image Classification* and *Object Detection* tasks are supported for the *Computer Vision* domain. *Black-Box* (model agnostic but slow) methods and *White-Box* (model specific but fast) methods are supported: -| Domain | Task | Type | Algorithm | Links | -|-----------------|----------------------|-----------|---------------------|-------| -| Computer Vision | Image Classification | White-Box | ReciproCAM | [arxiv](https://arxiv.org/abs/2209.14074) / [src](openvino_xai/methods/white_box/recipro_cam.py) | -| | | | VITReciproCAM | [arxiv](https://arxiv.org/abs/2310.02588) / [src](openvino_xai/methods/white_box/recipro_cam.py) | -| | | | ActivationMap | experimental / [src](openvino_xai/methods/white_box/activation_map.py) | -| | | Black-Box | AISEClassification | [src](openvino_xai/methods/black_box/aise.py) | -| | | | RISE | [arxiv](https://arxiv.org/abs/1806.07421v3) / [src](openvino_xai/methods/black_box/rise.py) | -| | Object Detection | White-Box | ClassProbabilityMap | experimental / [src](openvino_xai/methods/white_box/det_class_probability_map.py) | -| | | Black-Box | AISEDetection | [src](openvino_xai/methods/black_box/aise.py) | +| Domain | Task | Type | Algorithm | Links | +|-----------------|----------------------|-----------|------------------------|-------| +| Computer Vision | Image Classification | White-Box | ReciproCAM | [paper](https://openaccess.thecvf.com/content/CVPR2024W/XAI4CV/papers/Byun_ReciproCAM_Lightweight_Gradient-free_Class_Activation_Map_for_Post-hoc_Explanations_CVPRW_2024_paper.pdf) / [src](openvino_xai/me4thods/white_box/recipro_cam.py) | +| | | | VITReciproCAM | [paper](https://arxiv.org/abs/2310.02588) / [src](openvino_xai/methods/white_box/recipro_cam.py) | +| | | | ActivationMap | experimental / [src](openvino_xai/methods/white_box/activation_map.py) | +| | | Black-Box | AISEClassification | [src](openvino_xai/methods/black_box/aise/classification.py) | +| | | | RISE | [paper](https://arxiv.org/abs/1806.07421v3) / [src](openvino_xai/methods/black_box/rise.py) | +| | Object Detection | White-Box | DetClassProbabilityMap | experimental / [src](openvino_xai/methods/white_box/det_class_probability_map.py) | +| | | Black-Box | AISEDetection | [src](openvino_xai/methods/black_box/aise/detection.py) | + +See more method comparison at the [User Guide](docs/source/user-guide.md). ### Supported explainable models @@ -86,10 +89,6 @@ Please refer to the following known issues for unsupported models and reasons. * [Runtime error from ONNX / OpenVINO IR models while conversion or inference for XAI (#29)](https://github.com/openvinotoolkit/openvino_xai/issues/29) * [Models not supported by white box XAI methods (#30)](https://github.com/openvinotoolkit/openvino_xai/issues/30) -> **_WARNING:_** OpenVINO XAI is fully validated on OpenVINO 2024.2.0. Following issue might be observed if older version of OpenVINO is used. -> * [OpenVINO IR branch insertion not working for models converted directly from torch models with OVC (#26)](https://github.com/openvinotoolkit/openvino_xai/issues/26) -> A simple workaround is to convert Torch models to ONNX models and then convert to OpenVINO models to feed to OpenVINO XAI. Please refer to [the code example](openvino_xai/utils/model_export.py). - > **_NOTE:_** GenAI / LLMs would be also supported incrementally in the upcoming releases. --- @@ -133,6 +132,18 @@ pip install -e .[dev] ``` +
+(Optional) Enable PyTorch support + +You can enjoy the PyTorch XAI feature if the PyTorch is installed along with the OpenVINO XAI. + +```bash +# Install PyTorch (CPU version as example) +pip3 install torch --index-url https://download.pytorch.org/whl/cpu +``` +Please refer to the [PyTorch Installation Guide](https://pytorch.org/get-started/locally/) for other options. +
+
Verify installation @@ -151,7 +162,7 @@ pre-commit run --all-files ### Hello, OpenVINO XAI -Let's imagine the case that our OpenVINO IR model is up and running on a inference pipeline. +Let's imagine the case that our OpenVINO model is up and running on a inference pipeline. While watching the outputs, we may want to analyze the model's behavior for debugging or understanding purposes. By using the **OpenVINO XAI** `Explainer`, we can visualize why the model gives such responses. @@ -194,6 +205,28 @@ Original image | Explained image We can see that model is focusing on the body or skin area of the animals to tell if this image contains actual cheetahs. +### Insert XAI head to your models + +Using the `insert_xai` API, we can insert XAI head to existing OpenVINO or PyTorch models directly and get additional "saliency_map" output in the same inference pipeline. + +```python +import torch +import timm + +# Get a PyTorch model from TIMM +torch_model: torch.nn.Module = timm.create_model("resnet18.a1_in1k", in_chans=3, pretrained=True) + +# Insert XAI head +model_xai: torch.nn.Module = xai.insert_xai(torch_model, xai.Task.CLASSIFICATION) + +# Torch XAI model inference +model_xai.eval() +with torch.no_grad(): + outputs = model_xai(torch.from_numpy(image_norm)) + logits = outputs["prediction"] # BxC + saliency_maps = outputs["saliency_map"] # BxCxHxW: per-class saliency map +``` + ### More advanced use-cases Users could tweak the basic use-case according to their purpose, which include but not limited to: @@ -203,17 +236,19 @@ Users could tweak the basic use-case according to their purpose, which include b * Customize output image visualization options * Explain multiple class targets, passing them as label indices or as actual label names * Call explainer multiple times to explain multiple images or to use different targets -* Using `insert_xai` API, insert XAI head to your OpenVINO IR model and get additional saliency map output in the same inference pipeline +* Insert XAI head to your PyTorch models and export to ONNX format to generate saliency maps on ONNX Runtime + (Refer to the [full example script](./examples/run_torch_onnx.py)) Please find more options and scenarios in the following links: * [OpenVINO XAI User Guide](docs/source/user-guide.md) * [OpenVINO Notebook - XAI Basic](https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/explainable-ai-1-basic/explainable-ai-1-basic.ipynb) * [OpenVINO Notebook - XAI Deep Dive](https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/explainable-ai-2-deep-dive/explainable-ai-2-deep-dive.ipynb) +* [OpenVINO Notebook - Saliency Map Interpretation](https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/explainable-ai-3-map-interpretation/explainable-ai-3-map-interpretation.ipynb) ### Playing with the examples -Please look around the runnable [example scripts](./examples) and play with them to get used to the `Explainer` APIs. +Please look around the runnable [example scripts](./examples) and play with them to get used to the `Explainer` and `insert_xai` APIs. ```bash # Prepare models by running tests (need "pip install openvino_xai[dev]" extra option) @@ -224,6 +259,9 @@ pytest tests/test_classification.py # All outputs will be stored in the corresponding output directory python examples/run_classification.py .data/otx_models/mlc_mobilenetv3_large_voc.xml \ tests/assets/cheetah_person.jpg --output output + +# Run PyTorch and ONNX support example +python examples/run_torch_onnx.py ``` --- diff --git a/docs/source/_static/map_samples/AISEDetection.jpg b/docs/source/_static/map_samples/AISEDetection.jpg new file mode 100644 index 00000000..5496ffc4 Binary files /dev/null and b/docs/source/_static/map_samples/AISEDetection.jpg differ diff --git a/docs/source/_static/map_samples/AISE_resnet18.a1_in1k_293.jpg b/docs/source/_static/map_samples/AISE_resnet18.a1_in1k_293.jpg new file mode 100644 index 00000000..5b2a6d7b Binary files /dev/null and b/docs/source/_static/map_samples/AISE_resnet18.a1_in1k_293.jpg differ diff --git a/docs/source/_static/map_samples/ActivationMap_resnet18.a1_in1k_activation_map.jpg b/docs/source/_static/map_samples/ActivationMap_resnet18.a1_in1k_activation_map.jpg new file mode 100644 index 00000000..50c55694 Binary files /dev/null and b/docs/source/_static/map_samples/ActivationMap_resnet18.a1_in1k_activation_map.jpg differ diff --git a/docs/source/_static/map_samples/DetClassProbabilityMap.jpg b/docs/source/_static/map_samples/DetClassProbabilityMap.jpg new file mode 100644 index 00000000..4cc2b9ac Binary files /dev/null and b/docs/source/_static/map_samples/DetClassProbabilityMap.jpg differ diff --git a/docs/source/_static/map_samples/RISE_resnet18.a1_in1k_293.jpg b/docs/source/_static/map_samples/RISE_resnet18.a1_in1k_293.jpg new file mode 100644 index 00000000..8fb5840f Binary files /dev/null and b/docs/source/_static/map_samples/RISE_resnet18.a1_in1k_293.jpg differ diff --git a/docs/source/_static/map_samples/ReciproCAM_resnet18.a1_in1k_293.jpg b/docs/source/_static/map_samples/ReciproCAM_resnet18.a1_in1k_293.jpg new file mode 100644 index 00000000..2c67bc4d Binary files /dev/null and b/docs/source/_static/map_samples/ReciproCAM_resnet18.a1_in1k_293.jpg differ diff --git a/docs/source/_static/ovxai-architecture.svg b/docs/source/_static/ovxai-architecture.svg index c0d2e7e3..73317cab 100644 --- a/docs/source/_static/ovxai-architecture.svg +++ b/docs/source/_static/ovxai-architecture.svg @@ -1 +1 @@ -OpenVINO RuntimeBlack-Box XAI MethodsIR InsertionWhite-Box XAI MethodsWhite-Box XAI Method BaseBlack-Box XAI Method BaseXAI Method InterfaceExplainer (Stateful Class)API (Easy-to-use Stateless Function)CLI (TBD)Black-Box XAI MethodsWhite-Box XAI MethodsBlack-Box XAI MethodsWhite-Box XAI MethodsBlack-Box XAI MethodsWhite-Box XAI MethodsBlack-Box XAI MethodsWhite-Box XAI MethodsMetrics \ No newline at end of file +OpenVINO RuntimeBlack-Box XAI MethodsXAI Branch InsertionWhite-Box XAI MethodsWhite-Box XAI Method BaseBlack-Box XAI Method BaseXAI Method Interface & FactoryExplainer (Stateful Class)API (Easy-to-use Stateless Function)CLI (TBD)Black-Box XAI MethodsWhite-Box XAI MethodsBlack-Box XAI MethodsWhite-Box XAI MethodsBlack-Box XAI MethodsWhite-Box XAI MethodsBlack-Box XAI MethodsWhite-Box XAI MethodsMetricsPytorchONNX \ No newline at end of file diff --git a/docs/source/conf.py b/docs/source/conf.py index 7b1dc5d9..0f982c73 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -18,7 +18,7 @@ project = "OpenVINO™ XAI" copyright = "2024, Intel(R) Corporation" author = "Intel(R) Corporation" -release = "1.0.0" +release = "1.2.0" # -- General configuration --------------------------------------------------- # https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration diff --git a/docs/source/user-guide.md b/docs/source/user-guide.md index 284cf7aa..58fb2e00 100644 --- a/docs/source/user-guide.md +++ b/docs/source/user-guide.md @@ -22,8 +22,14 @@ Content: - [White-Box mode](#white-box-mode) - [Black-Box mode](#black-box-mode) - [XAI insertion (white-box usage)](#xai-insertion-white-box-usage) + - [XAI methods](#xai-methods) + - [Overview](#overview) + - [Methods performance-accuracy comparison](#methods-performance-accuracy-comparison) + - [White-box methods](#white-box-methods) + - [Black-box methods](#black-box-methods) - [Plot saliency maps](#plot-saliency-maps) - [Saving saliency maps](#saving-saliency-maps) + - [Measure quality metrics of saliency maps](#measure-quality-metrics-of-saliency-maps) - [Example scripts](#example-scripts) @@ -132,7 +138,7 @@ preprocessed_image = np.expand_dims(preprocessed_image, 0) # Run explanation explanation = explainer( preprocessed_image, - target_explain_labels=[11, 14], # indices or string labels to explain + targets=[11, 14], # indices or string labels to explain overlay=True, # False by default original_input_image=image, # to apply overlay on the original image instead of the preprocessed one that was used for the explainer ) @@ -181,7 +187,7 @@ image = cv2.imread("path/to/image.jpg") # Run explanation explanation = explainer( image, - target_explain_labels=[11, 14], # indices or string labels to explain + targets=[11, 14], # indices or string labels to explain ) # Save saliency maps @@ -220,6 +226,7 @@ explainer = xai.Explainer( model, task=xai.Task.CLASSIFICATION, preprocess_fn=preprocess_fn, + explain_mode=ExplainMode.WHITEBOX, ) # Generate and process saliency maps (as many as required, sequentially) @@ -233,12 +240,11 @@ voc_labels = [ # Run explanation explanation = explainer( image, - explain_mode=ExplainMode.WHITEBOX, # target_layer="last_conv_node_name", # target_layer - node after which the XAI branch will be inserted, usually the last convolutional layer in the backbone embed_scaling=True, # True by default. If set to True, the saliency map scale (0 ~ 255) operation is embedded in the model explain_method=xai.Method.RECIPROCAM, # ReciproCAM is the default XAI method for CNNs label_names=voc_labels, - target_explain_labels=[11, 14], # target classes to explain, also ['dog', 'person'] is a valid input, since label_names are provided + targets=[11, 14], # target classes to explain, also ['dog', 'person'] is a valid input, since label_names are provided overlay=True, # False by default ) @@ -284,6 +290,7 @@ explainer = xai.Explainer( model, task=xai.Task.CLASSIFICATION, preprocess_fn=preprocess_fn, + explain_mode=ExplainMode.BLACKBOX, ) # Generate and process saliency maps (as many as required, sequentially) @@ -292,9 +299,8 @@ image = cv2.imread("path/to/image.jpg") # Run explanation explanation = explainer( image, - explain_mode=ExplainMode.BLACKBOX, - target_explain_labels=[11, 14], # target classes to explain - # target_explain_labels=-1, # explain all classes + targets=[11, 14], # target classes to explain + # targets=-1, # explain all classes overlay=True, # False by default ) @@ -313,25 +319,221 @@ As mentioned above, saliency map generation requires model inference. In the abo **Note**: The original model outputs are not affected, and the model should be inferable by the original inference pipeline. ```python +import cv2 import openvino.runtime as ov import openvino_xai as xai +from openvino_xai.explainer.visualizer import colormap, overlay # Create an ov.Model -model = ov.Core().read_model("path/to/model.xml") # type: ov.Model +model: ov.Model = ov.Core().read_model("path/to/model.xml") + +# Get and preprocess image +image = cv2.imread("path/to/image.jpg") +image_norm = preprocess_fn(image) -# Insert XAI branch into the model graph -model_xai = xai.insert_xai( +# Insert XAI branch into the OpenVINO model graph (IR) +model_xai: ov.Model = xai.insert_xai( model=model, task=xai.Task.CLASSIFICATION, - # target_layer="last_conv_node_name", # target_layer - the node after which the XAI branch will be inserted, usually the last convolutional layer in the backbone + # target_layer="last_conv_node_name", # target_layer - the node after which the XAI branch will be inserted, usually the last convolutional layer in the backbone. Defaults to None, by which the target layer is automatically detected embed_scaling=True, # True by default. If set to True, the saliency map scale (0 ~ 255) operation is embedded in the model explain_method=xai.Method.RECIPROCAM, # ReciproCAM is the default XAI method for CNNs -) # type: ov.Model +) -# ***** Downstream task: user's code that infers model_xai and picks 'saliency_map' output ***** +# Insert XAI branch into the Pytorch model +# XAI head is inserted using the module hook mechanism internally +# so that users could get additional saliency map without major changes in the original inference pipeline. +import torch +model: torch.nn.Module + +# Insert XAI head +model_xai: torch.nn.Module = xai.insert_xai(model=model, task=xai.Task.CLASSIFICATION) + +# Torch XAI model inference +model_xai.eval() +with torch.no_grad(): + outputs = model_xai(torch.from_numpy(image_norm)) + logits = outputs["prediction"] # BxC: original model prediction + saliency_maps = outputs["saliency_map"] # BxCxhxw: additional per-class saliency map + probs = torch.softmax(logits, dim=-1) + label = probs.argmax(dim=-1)[0] + +# Torch XAI model saliency map +saliency_maps = saliency_maps.numpy(force=True).squeeze(0) # Cxhxw +saliency_map = saliency_maps[label] # hxw saliency_map for the label +saliency_map = cv2.resize(saliency_map, dsize=image.shape[::-1]) # HxW +saliency_map = colormap(saliency_map[None, :]) # 1xHxWx3 +result_image = overlay(saliency_map, image)[0] # HxWx3 ``` +## XAI methods + +### Overview + +At the moment, the following XAI methods are supported: + +| Method | Using model internals | Per-target support | Single-shot | #Model inferences | +|:-----------------------|:---------------------:|:------------------:|:-----------:|:-----------------:| +| White-Box | | | | | +| Activation Map | Yes | No | Yes | 1 | +| Recipro-CAM | Yes | Yes (class) | Yes* | 1* | +| ViT Recipro-CAM | Yes | Yes (class) | Yes* | 1* | +| DetClassProbabilityMap | Yes** | Yes (class) | Yes | 1 | +| Black-Box | | | | | +| RISE | No | Yes (class) | No | 1000-10000 | +| AISEClassification | No | Yes (class) | No | 120-500 | +| AISEDetection | No | Yes (bbox) | No | 60-250 | + +\* Recipro-CAM re-infers part of the graph (usually neck + head or last transformer block) H*W times, where HxW – feature map size of the target layer. + +** DetClassProbabilityMap requires explicit definition of the target layers. +The rest of the white-box methods support automatic detection of the target layer. + +Target layer is the part of the model graph where XAI branch will be inserted (applicable for white-box methods). + +All supported methods are gradient-free, which suits deployment framework settings (e.g. OpenVINO™), where the model is in optimized or compiled representation. + +### Methods performance-accuracy comparison + +The table below compares accuracy and performace of different models and explain methods (learn more about [Quality Metrics](#measure-quality-metrics-of-saliency-maps)). + +Metrics were measured on a 10% random subset of the [ILSVRC 2012](https://www.image-net.org/challenges/LSVRC/index.php) validation dataset (5000 images, seed 42). + +| Model | Explain mode | Explain method | Explain time
#Model inferences | | Pointing game | | Insertion | Deletion | | ADCC | Coherency | Complexity | Average Drop | +|:---------------------------:|:------------:|:--------------:|:---------------------------------:|---|:-------------:|---|:---------:|:--------:|---|:--------:|:---------:|:----------:|:------------:| +| deit - tiny (transformer) | White box | VIT ReciproCAM | 1* | | **89.9** | | 22.4 | **4.5** | | 70.4 | 88.9 | **38.1** | 34.3 | +| | | Activation map | 1 | | 56.6 | | 7.8 | 7.0 | | 46.9 | 74.0 | 53.7 | 65.4 | +| | Black Box** | AISE | 60 | | 73.9 | | 15.9 | 8.9 | | 66.6 | 73.9 | 44.3 | 26.0 | +| | | RISE | 2000 | | 85.5 | | **23.2** | 5.8 | | **74.8** | **92.5** | 42.3 | **16.6** | +| | | | | | | | | | | | | | | +| resnet18 | White box | ReciproCAM | 1* | | **89.5** | | 33.9 | **5.9** | | **77.3** | 91.1 | 30.2 | 25.9 | +| | | Activation map | 1 | | 87.0 | | **36.3** | 10.5 | | 74.4 | **97.9** | **25.2** | 40.2 | +| | Black Box** | AISE | 60 | | 72.0 | | 22.5 | 12.4 | | 67.4 | 69.3 | 44.5 | 16.9 | +| | | RISE | 2000 | | 87.0 | | 34.6 | 7.1 | | 77.1 | 93.0 | 42.0 | **8.3** | + +\* Recipro-CAM re-infers part of the graph (usually neck + head or last transformer block) H*W times, where HxW is the feature map size of the target layer. + +\*\* For Black Box Methods preset = `SPEED` + + +### White-Box methods + +When to use? +- When model architecture follows standard CNN-based or ViT-based design (OV-XAI [support](../../README.md#supported-explainable-models) 1000+ CNN and ViT models). +- When speed matters. White-box methods are fast - it takes ~one model inference to generate saliency map. +- When it is required to obtain saliency map together with model prediction at the inference server environment. White-box methods update model graph, so that the XAI branch and saliency map output added to the model. Therefore, with a minor compute overhead, it is possible to generate both model predictions and saliency maps. + +All white-box methods require access to model internal state. To generate saliency map, supported white-box methods potentially change and process internal model activations in a way that fosters compute efficiency. + +#### Activation Map + +Suitable for: +- Binary classification problems (e.g. inspecting model reasoning when predicting a positive class). +- Visualization of the global (class-agnostic) model attention (e.g. inspecting which input pixels are the most salient for the model). + +Activation Map is the most basic and naive approach. It takes the outputs of the model’s feature extractor (backbone) and averages it in the channel dimension. The results highly rely on the backbone and ignore neck and head computations. Basically, it gives a relatively good and fast result, which highlight the most activated features from the backbone perspective. + +Below saliency map was obtained for [ResNet-18](https://huggingface.co/timm/resnet18.a1_in1k) from [timm](https://huggingface.co/timm): + +![OpenVINO XAI Architecture](_static/map_samples/ActivationMap_resnet18.a1_in1k_activation_map.jpg) + +#### Recipro-CAM (ViT Recipro-CAM for ViT models) + +Suitable for: +- Almost all CNN-based architectures. +- Many ViT-based architectures. + +[Recipro-CAM](../../openvino_xai/methods/white_box/recipro_cam.py) involves spatially masking of the extracted feature maps to exploit the correlation between activation maps and model predictions for target classes. It is perturbation-based method which perturbs internal model activations. + +Assume 7x7 feature map which is extracted by the CNN backbone. One location of the feature map is preserved (e.g. at index [0, 0]), while the rest feature map values is masked out with e.g. zeros (perturbation is the same across channel dimension). Perturbed feature map inferred through the model head. The the model prediction scores are used as saliency scores for index [0, 0]. This is repeated for all 49 spatial location. The final saliency map obtained after resizing and scaling. See [paper](https://openaccess.thecvf.com/content/CVPR2024W/XAI4CV/papers/Byun_ReciproCAM_Lightweight_Gradient-free_Class_Activation_Map_for_Post-hoc_Explanations_CVPRW_2024_paper.pdf) for more details. + +`Recipro-CAM` is an efficient XAI method. +The main weak point is that saliency for each pixel in the feature map space is estimated in isolation, without taking into account joint contribution of different pixels/features. + +`Recipro-CAM` is the default method for the classification task. [ViT Recipro-CAM](../../openvino_xai/methods/white_box/recipro_cam.py) is a modification of Recipro-CAM for ViT-based models. + +Below saliency map was obtained for [ResNet-18](https://huggingface.co/timm/resnet18.a1_in1k) from [timm](https://huggingface.co/timm) and "cheetah" class: + +![OpenVINO XAI Architecture](_static/map_samples/ReciproCAM_resnet18.a1_in1k_293.jpg) + +#### DetClassProbabilityMap + +Suitable for: +- Single-stage object detection models. +- When it is enough to estimate saliency maps per-class. + +[DetClassProbabilityMap](../../openvino_xai/methods/white_box/det_class_probability_map.py) takes the raw classification head output and uses class probability maps to calculate regions of interest for each class. So, it creates different salience maps for each class. This algorithm is implemented for single-stage detectors only and required explicit list of target layers. + +The main limitation of this method is that, due to the training loss design of most single-stage detectors, activation values drift towards the center of the object while propagating through the network. Many object detectors, while being designed to precisely estimate location of the objects, might mess up spatial location of object features in the latent space. + +Below saliency map was obtained for `YOLOX` trained in-house on PASCAL VOC dataset: + +![OpenVINO XAI Architecture](_static/map_samples/DetClassProbabilityMap.jpg) + +### Black-Box methods + +When to use? +- When custom models are used and/or white-box methods fail (e.g. Swin-based transformers). +- When more advanced model explanation is required. See more details below (e.g. in the RISE overview). +- When spacial location of the features is messed up in the latent space (e.g. some single-stage object detection models). + +All black-box methods are perturbation-based - they perturb the input and register the change in the output. +Usually, for high quality saliency map, hundreds or thousands of model inferences required. That is the reason for them to be compute-inefficient. On the other hand, black box methods are model-agnostic. + +Given that the quality of the saliency maps usually correlates with the number of available inferences, we propose the following presets for the black-box methods: `Preset.SPEED`, `Preset.BALANCE`, `Preset.QUALITY` (`Preset.BALANCE` is used by default). +Apart from that, methods parameters can be defined directly via Explainer or Method API. + +#### RISE + +Suitable for: +- All classification models which can generate per-class prediction scores. +- More flexibility and more advanced use cases (e.g. control of granularity of the saliency map). + +[RISE](../../openvino_xai/methods/black_box/rise.py) probes a model by sub-sampling the input image via random masks and records its response to each of them. +RISE creates random masks from down-scaled space (e.g. 7×7 grid) and adds random translation shifts for the pixel-level explanation with further up-sampling. Weighted sum of all sampled masks used to generate the fine-grained saliency map. +Since it approximates the true saliency map with Monte Carlo sampling, it requires multiple thousands of forward passes to generate a fine-grained saliency map. RISE is a non-deterministic method. +See [paper](https://arxiv.org/abs/1806.07421v3) for more details. + +`RISE` generates saliency maps for all classes at once, although indices of target classes can be provided (which might bring some performance boost). + +`RISE` has two main hyperparameter: `num_cells` (define the resolution of the grid for the masks) and `num_masks` (defines number of inferences). +Number of cells defines granularity of the saliency map (usually, the higher the granularity - the better). Higher number of cells require more masks to converge. Going from `Preset.SPEED` to `Preset.QUALITY`, the number of masks (compute budget) increases. + +Below saliency map was obtained for [ResNet-18](https://huggingface.co/timm/resnet18.a1_in1k) from [timm](https://huggingface.co/timm) and "cheetah" class (default parameters). + +![OpenVINO XAI Architecture](_static/map_samples/RISE_resnet18.a1_in1k_293.jpg) + +It is possible to see, that some grass-related pixels from the left cheetah also contribute to the cheetah prediction, which might indicates that model learned cheetah features in combination with grass (which makes sense). + +#### AISEClassification + +Suitable for: +- All classification models which can generate per-class prediction scores. +- Cases when speed matters. + +`AISE` formulates saliency map generation as a kernel density estimation (KDE) problem, and adaptively sample input masks using a derivative-free optimizer to maximize mask saliency score. KDE requires a proper kernel width, which is not known. A set of pre-defined kernel widths is used simultaneously, and the result is them aggregated. This adaptive sampling mechanism improves the efficiency of input mask generation and thus increases convergence speed. AISE is designed to be task-agnostic and can be applied to a wide range of classification and object detection architectures. +`AISE` is optimized for generating saliency map for a specific class (or a few classes). To specify target classes, use targets argument. + +[AISEClassification](../../openvino_xai/methods/black_box/aise/classification.py) is designed for classification models. + +Below saliency map was obtained for [ResNet-18](https://huggingface.co/timm/resnet18.a1_in1k) from [timm](https://huggingface.co/timm) and "cheetah" class: + +![OpenVINO XAI Architecture](_static/map_samples/AISE_resnet18.a1_in1k_293.jpg) + +#### AISEDetection + +Suitable for: +- All detection models which can generate bounding boxes, labels and scores. +- When speed matters. +- When it is required to get per-box saliency map. + +[AISEDetection](../../openvino_xai/methods/black_box/aise/detection.py) is designed for detection models and support per-bounding box saliency maps. + +Below saliency map was obtained for `YOLOX` trained in-house on PASCAL VOC dataset (with default parameters, `Preset.BALANCE`): + +![OpenVINO XAI Architecture](_static/map_samples/AISEDetection.jpg) + ## Plot saliency maps To visualize saliency maps, use the `explanation.plot` function. @@ -363,6 +565,7 @@ explainer = xai.Explainer( model, task=xai.Task.CLASSIFICATION, preprocess_fn=preprocess_fn, + explain_mode=ExplainMode.WHITEBOX, ) voc_labels = [ @@ -376,9 +579,8 @@ image = cv2.imread("path/to/image.jpg") # Run explanation explanation = explainer( image, - explain_mode=ExplainMode.WHITEBOX, label_names=voc_labels, - target_explain_labels=[7, 11], # ['cat', 'dog'] also possible as target classes to explain + targets=[7, 11], # ['cat', 'dog'] also possible as target classes to explain ) # Use matplotlib (recommended for Jupyter) - default backend @@ -444,6 +646,7 @@ explainer = xai.Explainer( model, task=xai.Task.CLASSIFICATION, preprocess_fn=preprocess_fn, + explain_mode=ExplainMode.WHITEBOX, ) voc_labels = [ @@ -466,9 +669,8 @@ scores_dict = {i: score for i, score in zip(result_idxs, result_scores)} # Run explanation explanation = explainer( image, - explain_mode=ExplainMode.WHITEBOX, label_names=voc_labels, - target_explain_labels=result_idxs, # target classes to explain + targets=result_idxs, # target classes to explain ) # Save saliency maps flexibly @@ -485,6 +687,90 @@ explanation.save( ) # image_name_aeroplane_conf_0.85.jpg ``` +## Measure quality metrics of saliency maps + +To compare different saliency maps, you can use the implemented quality metrics: Pointing Game, Insertion-Deletion AUC, and ADCC. + +- **ADCC (Average Drop-Coherence-Complexity)** ([paper](https://arxiv.org/abs/2104.10252)/[impl](https://github.com/aimagelab/ADCC/)) - averages three submetrics: + - **Average Drop** - The percentage drop in confidence when the model sees only the explanation map (image masked with the saliency map) instead of the full image. + - **Coherence** - The coherency between the saliency map on the input image and saliency map on the explanation map (image masked with the saliency map). Requires generating an extra explanation (can be time-consuming for black box methods). + - **Complexity** - Measures the L1 norm of the saliency map (average value per pixel). Fewer important pixels -> less complexity -> better saliency map. + +- **Insertion-Deletion AUC** ([paper](https://arxiv.org/abs/1806.07421)) - Measures the AUC of the curve of model confidence when important pixels are sequentially inserted or deleted. Time-consuming, requires 60 model inferences: 30 steps for insertion and 30 steps for deletion (number of steps is configurable). + +- **Pointing Game** ([paper](https://arxiv.org/abs/1608.00507)/[impl](https://github.com/understandable-machine-intelligence-lab/Quantus/blob/main/quantus/metrics/localisation/pointing_game.py)) - Returns True if the most important saliency map pixel falls into the object ground truth bounding box. Requires ground truth annotation, so it is convenient to use on public datasets (COCO, VOC, ILSVRC) rather than individual images (check [accuracy_tests](../../tests/perf/test_accuracy.py) for examples). + +Here is a comparison of the performance time (measured in model inferences) for different accuracy methods. The explain time (also in model inferences) is added along for the better picture. + +| Explain mode | Explain method | Explain time** | Pointing Game | Insertion/Deletion AUC | ADCC | +|:------------:|:--------------:|:----------------------------------:|:-------------:|:---------------------------------------------------------------------------------------------------------:|:--------------------------:| +| White Box | Activation map | 1 | 0 | 30 insertion + 30 deletion + 1 to define predicted class and check difference in its score | 2 + 1 explain (1*) | +| | ReciproCAM | 1* | 0 | 30 insertion + 30 deletion | 2 + 1 explain (1*) | +| | ViT ReciproCAM | 1* | 0 | 30 insertion + 30 deletion | 2 + 1 explain (1*) | +| Black Box | AISE-classification | 120-500 | 0 | 30 insertion + 30 deletion | 2 + 1 explain (120-150) | +| | RISE | 1000-10000 | 0 | 30 insertion + 30 deletion | 2 + 1 explain (1000-10000) | + +\* Recipro-CAM re-infers part of the graph (usually neck + head or last transformer block) H*W times, where HxW is the feature map size of the target layer. + +\*\* All time measurements are in number of model inferences. + +```python +import cv2 +import numpy as np +import openvino.runtime as ov +from typing import Mapping + +import openvino_xai as xai +from openvino_xai.explainer import ExplainMode +from openvino_xai.metrics import ADCC, InsertionDeletionAUC + + +def preprocess_fn(image: np.ndarray) -> np.ndarray: + """Preprocess the input image.""" + x = cv2.resize(src=image, dsize=(224, 224)) + x = x.transpose((2, 0, 1)) + processed_image = np.expand_dims(x, 0) + return processed_image + +def postprocess_fn(output: Mapping): + """Postprocess the model output.""" + return softmax(output["logits"]) + +def softmax(x: np.ndarray) -> np.ndarray: + """Compute softmax values of x.""" + e_x = np.exp(x - np.max(x)) + return e_x / e_x.sum() + +IMAGE_PATH = "path/to/image.jpg" +MODEL_PATH = "path/to/model.xml" + +image = cv2.imread(IMAGE_PATH) +model = ov.Core().read_model(MODEL_PATH) + +explainer = xai.Explainer( + model, + task=xai.Task.CLASSIFICATION, + preprocess_fn=preprocess_fn, + explain_mode=ExplainMode.WHITEBOX, + explain_method=xai.Method.RECIPROCAM # Also VITRECIPROCAM, AISE, RISE, ACTIVATIONMAP are supported +) + +# Generate explanation (if several targets are passed, metrics for all saliency maps will be aggregated) +explanation = explainer(image, targets=14, colormap=False, overlay=False, resize=True) + +# Calculate InsertionDeletionAUC metric over the list of explanations and input images +auc = InsertionDeletionAUC(model, preprocess_fn, postprocess_fn) +auc_score = auc.evaluate([explanation], [image], steps=30) # {'insertion': 0.43, 'deletion': 0.09, 'delta': 0.34} +insertion, deletion, delta = auc_score.values() +print(f"Insertion {deletion:.2f}, Deletion {insertion:.2f}, Delta {delta:.2f}") + +# Calculate ADCC metric over the list of explanations and input images +adcc = ADCC(model, preprocess_fn, postprocess_fn, explainer) +adcc_score = adcc.evaluate([explanation], [image]) # {'adcc': 0.95, 'coherency': 0.99, 'complexity': 0.13, 'average_drop': 0.0} +adcc, coherency, complexity, average_drop = adcc_score.values() +print(f"ADCC {adcc:.2f}, Coherency {coherency:.2f}, Complexity {complexity:.2f}, Average drop {average_drop:.2f}") +``` + ## Example scripts More usage scenarios that can be used with your own models and images as arguments are available in [examples](../../examples). diff --git a/examples/run_detection.py b/examples/run_detection.py index b5e90510..d773ec2a 100644 --- a/examples/run_detection.py +++ b/examples/run_detection.py @@ -89,8 +89,9 @@ def explain_white_box(args): # Save saliency maps for visual inspection if args.output is not None: - output = Path(args.output) / "detection" - explanation.save(output, Path(args.image_path).stem) + output = Path(args.output) / "detection_white_box" + ori_image_name = Path(args.image_path).stem + explanation.save(output, f"{ori_image_name}_") def explain_black_box(args): @@ -131,7 +132,8 @@ def explain_black_box(args): # Save saliency maps for visual inspection if args.output is not None: output = Path(args.output) / "detection_black_box" - explanation.save(output, f"{Path(args.image_path).stem}_") + ori_image_name = Path(args.image_path).stem + explanation.save(output, f"{ori_image_name}_") def main(argv): diff --git a/examples/run_torch_onnx.py b/examples/run_torch_onnx.py new file mode 100644 index 00000000..29c226ea --- /dev/null +++ b/examples/run_torch_onnx.py @@ -0,0 +1,232 @@ +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +import argparse +import importlib +import sys +from pathlib import Path + +import cv2 +import numpy as np +import openvino as ov + +from openvino_xai import Task, insert_xai +from openvino_xai.common.utils import logger, softmax +from openvino_xai.explainer.visualizer import colormap, overlay + +try: + torch = importlib.import_module("torch") + timm = importlib.import_module("timm") +except ImportError: + logger.error("Please install torch and timm package to run this example.") + exit(-1) + + +def get_argument_parser(): + parser = argparse.ArgumentParser() + parser.add_argument("--model_name", default="resnet18.a1_in1k", type=str) + parser.add_argument("--image_path", default="tests/assets/cheetah_person.jpg", type=str) + parser.add_argument("--output_dir", default=".data/example", type=str) + return parser + + +def run_insert_xai_torch(args: list[str]): + """Insert XAI head into PyTorch model and run inference on PyTorch Runtime to get saliency map.""" + + # Load Torch model from timm + try: + model = timm.create_model(args.model_name, in_chans=3, pretrained=True) + logger.info(f"Model config: {model.default_cfg}") + logger.info(f"Model layers: {model}") + except Exception as e: + logger.error(e) + logger.info(f"Please choose from {timm.list_models()}") + return + input_size = model.default_cfg["input_size"][1:] # (H, W) + input_mean = np.array(model.default_cfg["mean"]) + input_std = np.array(model.default_cfg["std"]) + + # Load image + image = cv2.imread("tests/assets/cheetah_person.jpg") + image = cv2.resize(image, dsize=input_size) + image = cv2.cvtColor(image, code=cv2.COLOR_BGR2RGB) + image_norm = ((image/255.0 - input_mean)/input_std).astype(np.float32) + image_norm = image_norm.transpose((2, 0, 1)) # HxWxC -> CxHxW + image_norm = image_norm[None, :] # CxHxW -> 1xCxHxW + + # Torch model inference + model.eval() + with torch.no_grad(): + logits = model(torch.from_numpy(image_norm)) + probs = torch.softmax(logits, dim=-1) # BxC + label = probs.argmax(dim=-1)[0] + logger.info(f"Torch model prediction: classes ({probs.shape[-1]}) -> label ({label}) -> prob ({probs[0, label]})") + + # Insert XAI head + model_xai: torch.nn.Module = insert_xai(model, Task.CLASSIFICATION, input_size=input_size) # Optional input size arg to help insertion + + # Torch XAI model inference + model_xai.eval() + with torch.no_grad(): + outputs = model_xai(torch.from_numpy(image_norm)) + logits = outputs["prediction"] # BxC + saliency_maps = outputs["saliency_map"] # BxCxhxw + probs = torch.softmax(logits, dim=-1) + label = probs.argmax(dim=-1)[0] + logger.info(f"Torch XAI model prediction: classes ({probs.shape[-1]}) -> label ({label}) -> prob ({probs[0, label]})") + + # Torch XAI model saliency map + saliency_maps = saliency_maps.numpy(force=True).squeeze(0) # Cxhxw + saliency_map = saliency_maps[label] # hxw saliency_map for the label + saliency_map = colormap(saliency_map[None, :]) # 1xhxw + saliency_map = cv2.resize(saliency_map.squeeze(0), dsize=input_size) # HxW + result_image = overlay(saliency_map, image) + result_image = cv2.cvtColor(result_image, code=cv2.COLOR_RGB2BGR) + result_image_path = Path(args.output_dir) / "xai-torch.png" + result_image_path.parent.mkdir(parents=True, exist_ok=True) + cv2.imwrite(result_image_path, result_image) + logger.info(f"Torch XAI model saliency map: {result_image_path}") + + +def run_insert_xai_torch_to_onnx(args: list[str]): + """Insert XAI head into PyTorch model, then converto to ONNX format and run inference on ONNX Runtime to get saliency map.""" + + # ONNX import + try: + importlib.import_module("onnx") + onnxruntime = importlib.import_module("onnxruntime") + except ImportError: + logger.info("Please install onnx and onnxruntime package to run ONNX XAI example.") + return + + # Load Torch model from timm + try: + model = timm.create_model(args.model_name, in_chans=3, pretrained=True) + logger.info(f"Model config: {model.default_cfg}") + logger.info(f"Model layers: {model}") + except Exception as e: + logger.error(e) + logger.info(f"Please choose from {timm.list_models()}") + return + input_size = model.default_cfg["input_size"][1:] # (H, W) + input_mean = np.array(model.default_cfg["mean"]) + input_std = np.array(model.default_cfg["std"]) + + # Load image + image = cv2.imread("tests/assets/cheetah_person.jpg") + image = cv2.resize(image, dsize=input_size) + image = cv2.cvtColor(image, code=cv2.COLOR_BGR2RGB) + image_norm = ((image/255.0 - input_mean)/input_std).astype(np.float32) + image_norm = image_norm.transpose((2, 0, 1)) # HxWxC -> CxHxW + image_norm = image_norm[None, :] # CxHxW -> 1xCxHxW + + # Insert XAI head + model_xai: torch.nn.Module = insert_xai(model, Task.CLASSIFICATION, input_size=input_size) + + # ONNX model conversion + model_path = Path(args.output_dir) / "model.onnx" + model_path.parent.mkdir(parents=True, exist_ok=True) + torch.onnx.export( + model_xai, + torch.from_numpy(image_norm), + model_path, + input_names=["input"], + output_names=["prediction", "saliency_map"], + ) + logger.info(f"ONNX XAI model: {model_path}") + + # ONNX model inference + session = onnxruntime.InferenceSession(model_path) + outputs = session.run( + output_names=["prediction", "saliency_map"], + input_feed={"input": image_norm.astype(np.float32)}, + ) + logits, saliency_maps = outputs # NOTE: dict keys are removed in Torch->ONNX conversion + probs = softmax(logits) + label = probs.argmax(axis=-1)[0] + logger.info(f"ONNX XAI model prediction: classes ({probs.shape[-1]}) -> label ({label}) -> prob ({probs[0, label]})") + + # ONNX model saliency map + saliency_maps = saliency_maps.squeeze(0) # Cxhxw + saliency_map = saliency_maps[label] # hxw saliency_map for the label + saliency_map = colormap(saliency_map[None, :]) # 1xhxw + saliency_map = cv2.resize(saliency_map.squeeze(0), dsize=input_size) # HxW + result_image = overlay(saliency_map, image) + result_image = cv2.cvtColor(result_image, code=cv2.COLOR_RGB2BGR) + result_image_path = Path(args.output_dir) / "xai-onnx.png" + result_image_path.parent.mkdir(parents=True, exist_ok=True) + cv2.imwrite(result_image_path, result_image) + logger.info(f"ONNX XAI model saliency map: {result_image_path}") + + +def run_insert_xai_torch_to_openvino(args: list[str]): + """Insert XAI head into PyTorch model, then convert to OpenVINO format and run inference on OpenVINO Runtime to get saliency map.""" + + # Load Torch model from timm + try: + model = timm.create_model(args.model_name, in_chans=3, pretrained=True) + logger.info(f"Model config: {model.default_cfg}") + logger.info(f"Model layers: {model}") + except Exception as e: + logger.error(e) + logger.info(f"Please choose from {timm.list_models()}") + return + input_size = model.default_cfg["input_size"][1:] # (H, W) + input_mean = np.array(model.default_cfg["mean"]) + input_std = np.array(model.default_cfg["std"]) + + # Load image + image = cv2.imread("tests/assets/cheetah_person.jpg") + image = cv2.resize(image, dsize=input_size) + image = cv2.cvtColor(image, code=cv2.COLOR_BGR2RGB) + image_norm = ((image/255.0 - input_mean)/input_std).astype(np.float32) + image_norm = image_norm.transpose((2, 0, 1)) # HxWxC -> CxHxW + image_norm = image_norm[None, :] # CxHxW -> 1xCxHxW + + # Insert XAI head + model_xai: torch.nn.Module = insert_xai(model, Task.CLASSIFICATION, input_size=input_size) + + # OpenVINO model conversion + ov_model = ov.convert_model( + model_xai, + example_input=torch.from_numpy(image_norm), + input=(ov.PartialShape([-1, *image_norm.shape[1:]],)) + ) + model_path = Path(args.output_dir) / "model.xml" + model_path.parent.mkdir(parents=True, exist_ok=True) + ov.save_model(ov_model, model_path) + logger.info(f"OpenVINO XAI model: {model_path}") + + # OpenVINO XAI model inference + ov_model = ov.Core().compile_model(ov_model, device_name="CPU") + outputs = ov_model(image_norm) + logits = outputs["prediction"] # BxC + saliency_maps = outputs["saliency_map"] # BxCxhxw + probs = softmax(logits) + label = probs.argmax(axis=-1)[0] + logger.info(f"OpenVINO XAI model prediction: classes ({probs.shape[-1]}) -> label ({label}) -> prob ({probs[0, label]})") + + # OpenVINO XAI model saliency map + saliency_maps = saliency_maps.squeeze(0) # Cxhxw + saliency_map = saliency_maps[label] # hxw saliency_map for the label + saliency_map = colormap(saliency_map[None, :]) # 1xhxw + saliency_map = cv2.resize(saliency_map.squeeze(0), dsize=input_size) # HxW + result_image = overlay(saliency_map, image) + result_image = cv2.cvtColor(result_image, code=cv2.COLOR_RGB2BGR) + result_image_path = Path(args.output_dir) / "xai-openvino.png" + result_image_path.parent.mkdir(parents=True, exist_ok=True) + cv2.imwrite(result_image_path, result_image) + logger.info(f"OpenVINO XAI model saliency map: {result_image_path}") + + +def main(argv: list[str]): + parser = get_argument_parser() + args = parser.parse_args(argv) + + run_insert_xai_torch(args) + run_insert_xai_torch_to_onnx(args) + run_insert_xai_torch_to_openvino(args) + + +if __name__ == "__main__": + main(sys.argv[1:]) diff --git a/openvino_xai/api/api.py b/openvino_xai/api/api.py index 8b28ec4b..2cb3b780 100644 --- a/openvino_xai/api/api.py +++ b/openvino_xai/api/api.py @@ -4,11 +4,11 @@ from typing import List, TypeVar import openvino as ov -import torch from openvino_xai.common.parameters import Method, Task from openvino_xai.common.utils import IdentityPreprocessFN, has_xai, logger from openvino_xai.methods.factory import WhiteBoxMethodFactory +from openvino_xai.utils.torch import torch Model = TypeVar("Model", ov.Model, torch.nn.Module) @@ -50,7 +50,6 @@ def insert_xai( explain_method=explain_method, target_layer=target_layer, embed_scaling=embed_scaling, - prepare_model=False, **kwargs, ) diff --git a/openvino_xai/common/utils.py b/openvino_xai/common/utils.py index 95441048..b8c1b763 100644 --- a/openvino_xai/common/utils.py +++ b/openvino_xai/common/utils.py @@ -11,7 +11,8 @@ import numpy as np import openvino as ov -import torch + +from openvino_xai.utils.torch import torch logger = logging.getLogger("openvino_xai") handler = logging.StreamHandler() @@ -96,10 +97,10 @@ def sigmoid(x: np.ndarray) -> np.ndarray: return 1 / (1 + np.exp(-x)) -def softmax(x: np.ndarray) -> np.ndarray: +def softmax(x: np.ndarray, axis=-1) -> np.ndarray: """Compute softmax values of x.""" - e_x = np.exp(x - np.max(x)) - return e_x / e_x.sum() + e_x = np.exp(x - np.max(x, axis=axis)) + return e_x / e_x.sum(axis=axis) class IdentityPreprocessFN: diff --git a/openvino_xai/explainer/explainer.py b/openvino_xai/explainer/explainer.py index a27fc74b..2eb2f56e 100644 --- a/openvino_xai/explainer/explainer.py +++ b/openvino_xai/explainer/explainer.py @@ -34,6 +34,7 @@ class ExplainMode(Enum): Contains the following values: WHITEBOX - The model is explained in white box mode, i.e. XAI branch is getting inserted into the model graph. BLACKBOX - The model is explained in black box model. + AUTO - The model is explained in the white-box mode first, if fails - black-box mode will run. """ WHITEBOX = "whitebox" @@ -148,6 +149,7 @@ def __call__( colormap: bool = True, overlay: bool = False, overlay_weight: float = 0.5, + overlay_prediction: bool = True, **kwargs, ) -> Explanation: return self.explain( @@ -161,6 +163,7 @@ def __call__( colormap, overlay, overlay_weight, + overlay_prediction, **kwargs, ) @@ -176,6 +179,7 @@ def explain( colormap: bool = True, overlay: bool = False, overlay_weight: float = 0.5, + overlay_prediction: bool = True, **kwargs, ) -> Explanation: """ @@ -202,6 +206,8 @@ def explain( :type overlay: bool :parameter overlay_weight: Weight of the saliency map when overlaying the input data with the saliency map. :type overlay_weight: float + :parameter overlay_prediction: If True, plot model prediction over the overlay. + :type overlay_prediction: bool """ targets = convert_targets_to_numpy(targets) @@ -235,6 +241,7 @@ def explain( colormap, overlay, overlay_weight, + overlay_prediction, ) def model_forward(self, x: np.ndarray, preprocess: bool = True) -> Mapping: @@ -280,6 +287,7 @@ def _visualize( colormap: bool, overlay: bool, overlay_weight: float, + overlay_prediction: bool, ) -> Explanation: if output_size is None: reference_image = data if original_input_image is None else original_input_image @@ -294,5 +302,6 @@ def _visualize( colormap=colormap, overlay=overlay, overlay_weight=overlay_weight, + overlay_prediction=overlay_prediction, ) return explanation diff --git a/openvino_xai/explainer/explanation.py b/openvino_xai/explainer/explanation.py index 28c9bc42..a7ce2dac 100644 --- a/openvino_xai/explainer/explanation.py +++ b/openvino_xai/explainer/explanation.py @@ -177,7 +177,6 @@ def save( :type postfix: str :param confidence_scores: Dict with confidence scores for each class index. Default is None. :type confidence_scores: Dict[int, float] | None - """ os.makedirs(dir_path, exist_ok=True) @@ -188,7 +187,7 @@ def save( map_to_save = cv2.cvtColor(map_to_save, code=cv2.COLOR_RGB2BGR) if isinstance(target_idx, str): target_name = "activation_map" - elif self.label_names and isinstance(target_idx, np.int64) and self.task != Task.DETECTION: + elif self.label_names and isinstance(target_idx, (int, np.int64)) and self.task != Task.DETECTION: target_name = self.label_names[target_idx] else: target_name = str(target_idx) @@ -261,7 +260,12 @@ def _plot_matplotlib(self, checked_targets: list[int | str], num_cols: int) -> N map_to_plot = self.saliency_map[target_index] - axes[i].imshow(map_to_plot) + if map_to_plot.ndim == 3: + axes[i].imshow(map_to_plot) + elif map_to_plot.ndim == 2: + axes[i].imshow(map_to_plot, cmap="gray") + else: + raise ValueError(f"Saliency map expected to be 3 or 2-dimensional, but got {map_to_plot.ndim}.") axes[i].axis("off") # Hide the axis axes[i].set_title(f"Class {label_name}") diff --git a/openvino_xai/explainer/visualizer.py b/openvino_xai/explainer/visualizer.py index 32c5b3d4..825dfc50 100644 --- a/openvino_xai/explainer/visualizer.py +++ b/openvino_xai/explainer/visualizer.py @@ -174,14 +174,14 @@ def visualize( # Convert back to dict return self._update_explanation_with_processed_sal_map(explanation, saliency_map_np, indices_to_return) - @staticmethod def _put_classification_info( + self, saliency_map_np: np.ndarray, indices: List[int], label_names: List[str] | None, predictions: Dict[int, Prediction] | None, ) -> None: - corner_location = 3, 17 + offset = 3 for smap, target_index in zip(range(len(saliency_map_np)), indices): label = label_names[target_index] if label_names else str(target_index) if predictions and target_index in predictions: @@ -189,18 +189,19 @@ def _put_classification_info( if score: label = f"{label}|{score:.2f}" + font_scale, text_height = self._fit_text_to_image(label, offset, saliency_map_np[smap].shape[1]) cv2.putText( saliency_map_np[smap], label, - org=corner_location, - fontFace=1, - fontScale=1.3, + org=(offset, text_height + offset), + fontFace=2, + fontScale=font_scale, color=(255, 0, 0), - thickness=2, + thickness=1, ) - @staticmethod def _put_detection_info( + self, saliency_map_np: np.ndarray, indices: List[int], label_names: List[str] | None, @@ -209,6 +210,7 @@ def _put_detection_info( if not predictions: return + offset = 7 for smap, target_index in zip(range(len(saliency_map_np)), indices): saliency_map = saliency_map_np[smap] label_index = predictions[target_index].label @@ -220,17 +222,40 @@ def _put_detection_info( label = label_names[label_index] if label_names else label_index label_score = f"{label}|{score:.2f}" - box_location = int(x1), int(y1 - 5) + + font_scale, _ = self._fit_text_to_image(label_score, x1, saliency_map.shape[1]) + box_location = x1, y1 - offset cv2.putText( saliency_map, label_score, org=box_location, - fontFace=1, - fontScale=1.3, + fontFace=2, + fontScale=font_scale, color=(255, 0, 0), - thickness=2, + thickness=1, ) + @staticmethod + def _fit_text_to_image( + text: str, + x_start: int, + image_width: int, + font_scale: float = 1.0, + thickness: int = 1, + ) -> Tuple[float, int]: + font_face = 2 + max_width = image_width - 5 + while True: + text_size, _ = cv2.getTextSize(text, font_face, font_scale, thickness) + text_width, text_height = text_size + + if x_start + text_width <= max_width: + return font_scale, text_height + + font_scale -= 0.1 + if abs(font_scale - 0.1) < 0.001: + return font_scale, text_height + @staticmethod def _apply_scaling(explanation: Explanation, saliency_map_np: np.ndarray) -> np.ndarray: if explanation.layout not in GRAY_LAYOUTS: diff --git a/openvino_xai/methods/base.py b/openvino_xai/methods/base.py index cb385ac0..26090d96 100644 --- a/openvino_xai/methods/base.py +++ b/openvino_xai/methods/base.py @@ -7,9 +7,9 @@ import numpy as np import openvino as ov -import torch from openvino_xai.common.utils import IdentityPreprocessFN +from openvino_xai.utils.torch import torch Model = TypeVar("Model", ov.Model, torch.nn.Module) CompiledModel = TypeVar("CompiledModel", ov.CompiledModel, torch.nn.Module) diff --git a/openvino_xai/methods/black_box/aise/base.py b/openvino_xai/methods/black_box/aise/base.py index d384077d..2aa5b526 100644 --- a/openvino_xai/methods/black_box/aise/base.py +++ b/openvino_xai/methods/black_box/aise/base.py @@ -10,7 +10,7 @@ import openvino.runtime as ov from scipy.optimize import direct -from openvino_xai.common.utils import IdentityPreprocessFN +from openvino_xai.common.utils import IdentityPreprocessFN, is_bhwc_layout from openvino_xai.methods.black_box.base import BlackBoxXAIMethod @@ -92,6 +92,8 @@ def _objective_function(self, args) -> float: kernel_mask = self._mask_generator.generate_kernel_mask(kernel_params) kernel_mask = np.clip(kernel_mask, 0, 1) + if is_bhwc_layout(self.data_preprocessed): + kernel_mask = np.expand_dims(kernel_mask, 2) pred_loss_preserve = 0.0 if self.preservation: diff --git a/openvino_xai/methods/black_box/aise/classification.py b/openvino_xai/methods/black_box/aise/classification.py index f9b38f2e..7c1947d1 100644 --- a/openvino_xai/methods/black_box/aise/classification.py +++ b/openvino_xai/methods/black_box/aise/classification.py @@ -57,6 +57,8 @@ def __init__( prepare_model=prepare_model, ) self.bounds = Bounds([0.0, 0.0], [1.0, 1.0]) + self.num_iterations_per_kernel: int | None = None + self.kernel_widths: List[float] | np.ndarray | None = None def generate_saliency_map( # type: ignore self, @@ -135,14 +137,14 @@ def _preset_parameters( kernel_widths: List[float] | np.ndarray | None, ) -> Tuple[int, np.ndarray]: if preset == Preset.SPEED: - iterations = 25 + iterations = 20 widths = np.linspace(0.1, 0.25, 3) elif preset == Preset.BALANCE: iterations = 50 widths = np.linspace(0.1, 0.25, 3) elif preset == Preset.QUALITY: - iterations = 85 - widths = np.linspace(0.075, 0.25, 4) + iterations = 50 + widths = np.linspace(0.075, 0.25, 5) else: raise ValueError(f"Preset {preset} is not supported.") diff --git a/openvino_xai/methods/black_box/aise/detection.py b/openvino_xai/methods/black_box/aise/detection.py index 7c8599bf..32c1f5ed 100644 --- a/openvino_xai/methods/black_box/aise/detection.py +++ b/openvino_xai/methods/black_box/aise/detection.py @@ -57,6 +57,8 @@ def __init__( ) self.deletion = False self.predictions = {} + self.num_iterations_per_kernel: int | None = None + self.divisors: List[float] | np.ndarray | None = None def generate_saliency_map( # type: ignore self, @@ -148,13 +150,13 @@ def _preset_parameters( divisors: List[float] | np.ndarray | None, ) -> Tuple[int, np.ndarray]: if preset == Preset.SPEED: - iterations = 50 + iterations = 20 divs = np.linspace(7, 1, 3) elif preset == Preset.BALANCE: - iterations = 100 + iterations = 50 divs = np.linspace(7, 1, 3) elif preset == Preset.QUALITY: - iterations = 150 + iterations = 50 divs = np.linspace(8, 1, 5) else: raise ValueError(f"Preset {preset} is not supported.") diff --git a/openvino_xai/methods/black_box/rise.py b/openvino_xai/methods/black_box/rise.py index e9a31024..57b33bc2 100644 --- a/openvino_xai/methods/black_box/rise.py +++ b/openvino_xai/methods/black_box/rise.py @@ -45,6 +45,8 @@ def __init__( super().__init__( model=model, postprocess_fn=postprocess_fn, preprocess_fn=preprocess_fn, device_name=device_name ) + self.num_masks: int | None = None + self.num_cells: int | None = None if prepare_model: self.prepare_model() @@ -55,7 +57,7 @@ def generate_saliency_map( target_indices: List[int] | None = None, preset: Preset = Preset.BALANCE, num_masks: int | None = None, - num_cells: int = 8, + num_cells: int | None = None, prob: float = 0.5, seed: int = 0, scale_output: bool = True, @@ -84,13 +86,11 @@ def generate_saliency_map( """ data_preprocessed = self.preprocess_fn(data) - num_masks = self._preset_parameters(preset, num_masks) + self.num_masks, self.num_cells = self._preset_parameters(preset, num_masks, num_cells) saliency_maps = self._run_synchronous_explanation( data_preprocessed, target_indices, - num_masks, - num_cells, prob, seed, ) @@ -109,26 +109,31 @@ def generate_saliency_map( def _preset_parameters( preset: Preset, num_masks: int | None = None, - ) -> int: - # TODO (negvet): preset num_cells - if num_masks is not None: - return num_masks - + num_cells: int | None = None, + ) -> Tuple[int, int]: if preset == Preset.SPEED: - return 2000 + num_masks_ = 1000 + num_cells_ = 4 elif preset == Preset.BALANCE: - return 5000 + num_masks_ = 5000 + num_cells_ = 8 elif preset == Preset.QUALITY: - return 8000 + num_masks_ = 10000 + num_cells_ = 12 else: raise ValueError(f"Preset {preset} is not supported.") + if num_masks is None: + num_masks = num_masks_ + if num_cells is None: + num_cells = num_cells_ + + return num_masks, num_cells + def _run_synchronous_explanation( self, data_preprocessed: np.ndarray, target_classes: List[int] | None, - num_masks: int, - num_cells: int, prob: float, seed: int, ) -> np.ndarray: @@ -145,8 +150,8 @@ def _run_synchronous_explanation( rand_generator = np.random.default_rng(seed=seed) saliency_maps = np.zeros((num_targets, input_size[0], input_size[1])) - for _ in tqdm(range(0, num_masks), desc="Explaining in synchronous mode"): - mask = self._generate_mask(input_size, num_cells, prob, rand_generator) + for _ in tqdm(range(0, self.num_masks), desc="Explaining in synchronous mode"): + mask = self._generate_mask(input_size, self.num_cells, prob, rand_generator) # Add channel dimensions for masks if is_bhwc_layout(data_preprocessed): masked = np.expand_dims(mask, 2) * data_preprocessed diff --git a/openvino_xai/methods/factory.py b/openvino_xai/methods/factory.py index 36229ffd..d6946e53 100644 --- a/openvino_xai/methods/factory.py +++ b/openvino_xai/methods/factory.py @@ -6,7 +6,6 @@ import numpy as np import openvino as ov -import torch from openvino_xai.common.parameters import Method, Task from openvino_xai.common.utils import IdentityPreprocessFN, logger @@ -21,6 +20,7 @@ DetClassProbabilityMap, ) from openvino_xai.methods.white_box.recipro_cam import ReciproCAM, ViTReciproCAM +from openvino_xai.utils.torch import torch class MethodFactory(ABC): diff --git a/openvino_xai/methods/white_box/activation_map.py b/openvino_xai/methods/white_box/activation_map.py index a45fea60..bc28477a 100644 --- a/openvino_xai/methods/white_box/activation_map.py +++ b/openvino_xai/methods/white_box/activation_map.py @@ -5,12 +5,12 @@ import numpy as np import openvino.runtime as ov -import torch from openvino.runtime import opset10 as opset from openvino_xai.common.utils import IdentityPreprocessFN from openvino_xai.inserter.model_parser import IRParserCls, ModelType from openvino_xai.methods.white_box.base import WhiteBoxMethod +from openvino_xai.utils.torch import torch class ActivationMap(WhiteBoxMethod): diff --git a/openvino_xai/methods/white_box/base.py b/openvino_xai/methods/white_box/base.py index e7a8e10e..c5d36ad3 100644 --- a/openvino_xai/methods/white_box/base.py +++ b/openvino_xai/methods/white_box/base.py @@ -4,7 +4,6 @@ import copy from abc import abstractmethod from typing import Callable -from venv import logger import numpy as np import openvino.runtime as ov @@ -14,6 +13,7 @@ SALIENCY_MAP_OUTPUT_NAME, IdentityPreprocessFN, has_xai, + logger, ) from openvino_xai.inserter.inserter import insert_xai_branch_into_model from openvino_xai.methods.base import MethodBase @@ -43,7 +43,6 @@ def __init__( ): super().__init__(preprocess_fn=preprocess_fn, device_name=device_name) self._model_ori = copy.deepcopy(model) - self.preprocess_fn = preprocess_fn self.embed_scaling = embed_scaling @property @@ -60,18 +59,16 @@ def generate_saliency_map(self, data: np.ndarray, *args, **kwargs) -> np.ndarray return model_output[SALIENCY_MAP_OUTPUT_NAME] def prepare_model(self, load_model: bool = True) -> ov.Model: - if has_xai(self._model_ori): - logger.info("Provided IR model already contains XAI branch.") - self._model = self._model_ori - if load_model: - self._model_compiled = ov.Core().compile_model(model=self._model, device_name=self._device_name) - return self._model - - xai_output_node = self.generate_xai_branch() - self._model = insert_xai_branch_into_model(self._model_ori, xai_output_node, self.embed_scaling) - if not has_xai(self._model): - raise RuntimeError("Insertion of the XAI branch into the model was not successful.") - if load_model: + if self._model is None: + if has_xai(self._model_ori): + logger.info("Provided IR model already contains XAI branch.") + self._model = self._model_ori + else: + xai_output_node = self.generate_xai_branch() + self._model = insert_xai_branch_into_model(self._model_ori, xai_output_node, self.embed_scaling) + if not has_xai(self._model): + raise RuntimeError("Insertion of the XAI branch into the model was not successful.") + if load_model and self._model_compiled is None: self._model_compiled = ov.Core().compile_model(model=self._model, device_name=self._device_name) return self._model diff --git a/openvino_xai/methods/white_box/recipro_cam.py b/openvino_xai/methods/white_box/recipro_cam.py index e5ae6f0b..8cef2af1 100644 --- a/openvino_xai/methods/white_box/recipro_cam.py +++ b/openvino_xai/methods/white_box/recipro_cam.py @@ -7,12 +7,12 @@ import numpy as np import openvino.runtime as ov -import torch from openvino.runtime import opset10 as opset from openvino_xai.common.utils import IdentityPreprocessFN from openvino_xai.inserter.model_parser import IRParserCls, ModelType from openvino_xai.methods.white_box.base import WhiteBoxMethod +from openvino_xai.utils.torch import torch class FeatureMapPerturbationBase(WhiteBoxMethod): diff --git a/openvino_xai/methods/white_box/torch.py b/openvino_xai/methods/white_box/torch.py index 2163dca8..b2f763ae 100644 --- a/openvino_xai/methods/white_box/torch.py +++ b/openvino_xai/methods/white_box/torch.py @@ -8,11 +8,16 @@ from typing import Any, Callable, Dict, Mapping import numpy as np -import torch -from openvino_xai.common.utils import SALIENCY_MAP_OUTPUT_NAME, has_xai +from openvino_xai.common.utils import SALIENCY_MAP_OUTPUT_NAME, has_xai, logger from openvino_xai.methods.base import IdentityPreprocessFN, MethodBase +try: + import torch +except ImportError as e: + logger.error("Please install pytorch to enable PyTorch model support.") + raise e + class TorchWhiteBoxMethod(MethodBase[torch.nn.Module, torch.nn.Module]): """ @@ -42,11 +47,13 @@ def __init__( embed_scaling: bool = True, device_name: str = "CPU", prepare_model: bool = True, + input_size: tuple[int, int] = (224, 224), # For fixed input size models like ViT **kwargs, ): super().__init__(model=model, preprocess_fn=preprocess_fn, device_name=device_name) self._target_layer = target_layer self._embed_scaling = embed_scaling + self._input_size = input_size if prepare_model: self.prepare_model() @@ -61,6 +68,7 @@ def prepare_model(self, load_model: bool = True) -> torch.nn.Module: return self._model_compiled model = copy.deepcopy(self._model) + model.eval() # Feature if self._target_layer: @@ -73,7 +81,6 @@ def prepare_model(self, load_model: bool = True) -> torch.nn.Module: model.register_forward_hook(self._output_hook) setattr(model, "has_xai", True) - model.eval() if load_model: self._model_compiled = model @@ -114,17 +121,26 @@ def _find_feature_module_auto(self, module: torch.nn.Module) -> torch.nn.Module: self._feature_module = None self._num_modules = 0 + def _has_spatial_dim(shape: torch.Size): + if len(shape) != 4: # BxCxHxW + return False + if shape[2] <= 1 or shape[3] <= 1: # H > 1 and W > 1 + return False + if shape[1] <= shape[2] or shape[1] <= shape[3]: # H < C and H < C for feature maps generally + return False + return True + def _detect_hook(module: torch.nn.Module, inputs: Any, output: Any) -> None: if isinstance(output, torch.Tensor): module.index = self._num_modules self._num_modules += 1 shape = output.shape - if len(shape) == 4 and shape[2] > 1 and shape[3] > 1: + if _has_spatial_dim(shape): self._feature_module = module global_hook_handle = torch.nn.modules.module.register_module_forward_hook(_detect_hook) try: - module.forward(torch.zeros((1, 3, 128, 128))) + module.forward(torch.zeros((1, 3, *self._input_size))) finally: global_hook_handle.remove() if self._feature_module is None: @@ -269,10 +285,13 @@ def __init__( def _find_feature_module_auto(self, module: torch.nn.Module) -> torch.nn.Module: """Detect feature module in the model by finding the 3rd last LayerNorm module.""" self._feature_module = None - norm_modules = [m for _, m in module.named_modules() if isinstance(m, torch.nn.LayerNorm)] + norm_modules = [] + for name, sub_module in module.named_modules(): + if "LayerNorm" in type(sub_module).__name__ or "BatchNorm" in type(sub_module).__name__ or "norm1" in name: + norm_modules.append(sub_module) if len(norm_modules) < 3: - raise RuntimeError("Feature modules with LayerNorm are less than 3 in the torch model") + raise RuntimeError("Feature modules with LayerNorm or BatchNorm are less than 3 in the torch model") self._feature_module = norm_modules[-3] return self._feature_module diff --git a/openvino_xai/metrics/adcc.py b/openvino_xai/metrics/adcc.py index 3c826a3d..e881f793 100644 --- a/openvino_xai/metrics/adcc.py +++ b/openvino_xai/metrics/adcc.py @@ -3,65 +3,56 @@ import numpy as np from scipy.stats import pearsonr -from openvino_xai import Task from openvino_xai.common.utils import scaling -from openvino_xai.explainer.explainer import Explainer, ExplainMode -from openvino_xai.explainer.explanation import Explanation +from openvino_xai.explainer.explanation import ONE_MAP_LAYOUTS, Explanation from openvino_xai.metrics.base import BaseMetric class ADCC(BaseMetric): """ - Implementation of the e Average Drop-Coherence-Complexity (ADCC) metric by Poppi, Samuele, et al 2021. + Implementation of the Average Drop-Coherence-Complexity (ADCC) metric by Poppi, Samuele, et al 2021. References: - Poppi, Samuele, et al. "Revisiting the evaluation of class activation mapping for explainability: - A novel metric and experimental analysis." Proceedings of the IEEE/CVF Conference on - Computer Vision and Pattern Recognition. 2021. + 1) Poppi, Samuele, et al. "Revisiting the evaluation of class activation mapping for explainability: + A novel metric and experimental analysis." Proceedings of the IEEE/CVF Conference on + Computer Vision and Pattern Recognition. 2021. + 2) Reference implementation: + https://github.com/aimagelab/ADCC/ """ - def __init__(self, model, preprocess_fn, postprocess_fn, explainer=None, device_name="CPU"): + def __init__(self, model, preprocess_fn, postprocess_fn, explainer, device_name="CPU", **kwargs: Any): super().__init__( model=model, preprocess_fn=preprocess_fn, postprocess_fn=postprocess_fn, device_name=device_name ) - if explainer is None: - self.explainer = Explainer( - model=model, - task=Task.CLASSIFICATION, - preprocess_fn=self.preprocess_fn, - explain_mode=ExplainMode.WHITEBOX, - ) - else: - self.explainer = explainer - - def average_drop( - self, saliency_map: np.ndarray, class_idx: int, image: np.ndarray, model_output: np.ndarray - ) -> float: + self.explainer = explainer + self.black_box_kwargs = kwargs + + def average_drop(self, masked_image: np.ndarray, class_idx: int, model_output: np.ndarray) -> float: """ Measures the average percentage drop in confidence for the target class when the model sees only the explanation map (image masked with saliency map), instead of the full image. The less the better. """ - confidence_on_input = np.max(model_output) - - masked_image = (image * saliency_map[:, :, None]).astype(np.uint8) + confidence_on_input = model_output[class_idx] prediction_on_saliency_map = self.model_predict(masked_image) confidence_on_saliency_map = prediction_on_saliency_map[class_idx] return max(0.0, confidence_on_input - confidence_on_saliency_map) / confidence_on_input - def coherency(self, saliency_map: np.ndarray, class_idx: int, image: np.ndarray) -> float: + def coherency(self, saliency_map: np.ndarray, masked_image: np.ndarray, class_idx: int, image: np.ndarray) -> float: """ - Measures the coherency of the saliency map. The explanation map (image masked with saliency map) should contain all the relevant features that explain a prediction and should remove useless features in a coherent way. + Measures the coherency of the saliency map. The explanation map (image masked with saliency map) should + contain all the relevant features that explain a prediction and should remove useless features in a coherent way. Saliency map and saliency map of exlanation map should be similar. The more the better. """ + saliency_map_masked_image = self.explainer( + masked_image, targets=class_idx, colormap=False, scaling=False, **self.black_box_kwargs + ) + saliency_map_masked_image = list(saliency_map_masked_image.saliency_map.values())[0] # only one target + saliency_map_masked_image = scaling(saliency_map_masked_image, cast_to_uint8=False, max_value=1) - masked_image = image * saliency_map[:, :, None] - saliency_map_mapped_image = self.explainer(masked_image, targets=[class_idx], colormap=False, scaling=False) - saliency_map_mapped_image = saliency_map_mapped_image.saliency_map[class_idx] - - A, B = saliency_map.flatten(), saliency_map_mapped_image.flatten() + A, B = saliency_map.flatten(), saliency_map_masked_image.flatten() # Pearson correlation coefficient y, _ = pearsonr(A, B) y = (y + 1) / 2 @@ -75,7 +66,7 @@ def complexity(saliency_map: np.ndarray) -> float: Defined as L1 norm of the saliency map. The less the better. """ - return abs(saliency_map).sum() / (saliency_map.shape[-1] * saliency_map.shape[-2]) + return saliency_map.sum() / (saliency_map.shape[-1] * saliency_map.shape[-2]) def __call__(self, saliency_map: np.ndarray, class_idx: int, input_image: np.ndarray) -> Dict[str, float]: """ @@ -99,9 +90,11 @@ def __call__(self, saliency_map: np.ndarray, class_idx: int, input_image: np.nda saliency_map = scaling(saliency_map, cast_to_uint8=False, max_value=1) model_output = self.model_predict(input_image) + masked_image = input_image * saliency_map[:, :, None] + class_idx = np.argmax(model_output) if class_idx is None else class_idx - avgdrop = self.average_drop(saliency_map, class_idx, input_image, model_output) - coh = self.coherency(saliency_map, class_idx, input_image) + avgdrop = self.average_drop(masked_image, class_idx, model_output) + coh = self.coherency(saliency_map, masked_image, class_idx, input_image) com = self.complexity(saliency_map) adcc = 3 / (1 / coh + 1 / (1 - com) + 1 / (1 - avgdrop)) @@ -126,14 +119,15 @@ def evaluate( results = [] for input_image, explanation in zip(input_images, explanations): for class_idx, saliency_map in explanation.saliency_map.items(): - metric_dict = self(saliency_map, int(class_idx), input_image) + target_idx = None if explanation.layout in ONE_MAP_LAYOUTS else int(class_idx) + metric_dict = self(saliency_map, target_idx, input_image) results.append( [ - metric_dict["adcc"], metric_dict["coherency"], metric_dict["complexity"], metric_dict["average_drop"], ] ) - adcc, coherency, complexity, average_drop = np.mean(np.array(results), axis=0) + coherency, complexity, average_drop = np.mean(np.array(results), axis=0) + adcc = 3 / (1 / coherency + 1 / (1 - complexity) + 1 / (1 - average_drop)) return {"adcc": adcc, "coherency": coherency, "complexity": complexity, "average_drop": average_drop} diff --git a/openvino_xai/metrics/insertion_deletion_auc.py b/openvino_xai/metrics/insertion_deletion_auc.py index 2d69b2e1..b488ee21 100644 --- a/openvino_xai/metrics/insertion_deletion_auc.py +++ b/openvino_xai/metrics/insertion_deletion_auc.py @@ -2,7 +2,7 @@ import numpy as np -from openvino_xai.explainer.explanation import Explanation, Layout +from openvino_xai.explainer.explanation import ONE_MAP_LAYOUTS, Explanation from openvino_xai.metrics.base import BaseMetric @@ -43,7 +43,7 @@ def step_image_insertion_deletion( return image_insertion, image_deletion def __call__( - self, saliency_map: np.ndarray, class_idx: int, input_image: np.ndarray, steps: int = 100, **kwargs: Any + self, saliency_map: np.ndarray, class_idx: int, input_image: np.ndarray, steps: int = 30, **kwargs: Any ) -> Dict[str, float]: """ Calculate the Insertion and Deletion AUC metrics for one saliency map for one class. @@ -62,6 +62,9 @@ def __call__( :return: A dictionary containing the AUC scores for insertion and deletion scores. :rtype: Dict[str, float] """ + + class_idx = np.argmax(self.model_predict(input_image)) if class_idx is None else class_idx + # Sort pixels by descending importance to find the most important pixels sorted_indices = np.argsort(-saliency_map.flatten()) sorted_indices = np.unravel_index(sorted_indices, saliency_map.shape) @@ -95,13 +98,11 @@ def evaluate( :return: A Dict containing the mean insertion AUC, mean deletion AUC, and their difference (delta) as values. :rtype: float """ - for explanation in explanations: - assert explanation.layout in [Layout.MULTIPLE_MAPS_PER_IMAGE_GRAY, Layout.MULTIPLE_MAPS_PER_IMAGE_COLOR] - results = [] for input_image, explanation in zip(input_images, explanations): for class_idx, saliency_map in explanation.saliency_map.items(): - metric_dict = self(saliency_map, int(class_idx), input_image, steps) + target_idx = None if explanation.layout in ONE_MAP_LAYOUTS else int(class_idx) + metric_dict = self(saliency_map, target_idx, input_image, steps) results.append([metric_dict["insertion"], metric_dict["deletion"]]) insertion, deletion = np.mean(np.array(results), axis=0) diff --git a/openvino_xai/metrics/pointing_game.py b/openvino_xai/metrics/pointing_game.py index 5bac00da..390f0469 100644 --- a/openvino_xai/metrics/pointing_game.py +++ b/openvino_xai/metrics/pointing_game.py @@ -6,7 +6,7 @@ import numpy as np from openvino_xai.common.utils import logger -from openvino_xai.explainer.explanation import Explanation +from openvino_xai.explainer.explanation import ONE_MAP_LAYOUTS, Explanation from openvino_xai.metrics.base import BaseMetric @@ -86,20 +86,25 @@ def evaluate( hits = 0.0 num_sal_maps = 0 for explanation, image_gt_bboxes in zip(explanations, gt_bboxes): - label_names = explanation.label_names - assert label_names is not None, "Label names are required for pointing game evaluation." - for class_idx, class_sal_map in explanation.saliency_map.items(): - label_name = label_names[int(class_idx)] - - if label_name not in image_gt_bboxes: - logger.info( - f"No ground-truth bbox for {label_name} saliency map. " - f"Skip pointing game evaluation for this saliency map." - ) - continue + if explanation.layout in ONE_MAP_LAYOUTS: + # Activation map + class_gt_bboxes = [ + gt_bbox for class_gt_bboxes in image_gt_bboxes.values() for gt_bbox in class_gt_bboxes + ] + else: + label_names = explanation.label_names + assert label_names is not None, "Label names are required for pointing game evaluation." + label_name = label_names[int(class_idx)] + + if label_name not in image_gt_bboxes: + logger.info( + f"No ground-truth bbox for {label_name} saliency map. " + f"Skip pointing game evaluation for this saliency map." + ) + continue + class_gt_bboxes = image_gt_bboxes[label_name] - class_gt_bboxes = image_gt_bboxes[label_name] hits += self(class_sal_map, class_gt_bboxes)["pointing_game"] num_sal_maps += 1 diff --git a/openvino_xai/utils/model_export.py b/openvino_xai/utils/model_export.py index 9f700e98..38870063 100644 --- a/openvino_xai/utils/model_export.py +++ b/openvino_xai/utils/model_export.py @@ -2,9 +2,8 @@ # SPDX-License-Identifier: Apache-2.0 import openvino -import pytest -torch = pytest.importorskip("torch") +from openvino_xai.utils.torch import torch def export_to_onnx(model: torch.nn.Module, save_path: str, data_sample: torch.Tensor, set_dynamic_batch: bool) -> None: # type: ignore diff --git a/openvino_xai/utils/torch.py b/openvino_xai/utils/torch.py new file mode 100644 index 00000000..323deb69 --- /dev/null +++ b/openvino_xai/utils/torch.py @@ -0,0 +1,14 @@ +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +try: + import torch +except ImportError: + # Dummy structure for pre-commit & type checking + class torch: # type: ignore[no-redef] + class nn: + class Module: + pass + + class Tensor: + pass diff --git a/pyproject.toml b/pyproject.toml index edb75e20..5bd2ba35 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -7,15 +7,14 @@ build-backend = "setuptools.build_meta" [project] name = "openvino_xai" -version = "1.1.0rc0" +version = "1.2.0rc0" dependencies = [ - "openvino-dev==2024.3", + "openvino-dev==2024.4", "opencv-python", "scipy", "numpy==1.*", "tqdm", "matplotlib", - "torch", ] requires-python = ">=3.10" authors = [ @@ -44,11 +43,13 @@ dev = [ "pre-commit==3.7.0", "addict", "timm==0.9.5", - "onnx==1.14.1", + "onnx", + "onnxruntime", "pandas", "py-cpuinfo", "openpyxl", "torchvision", + "pycocotools", ] doc = [ "furo", diff --git a/tests/assets/cheetah_coco/annotations/instances_val.json b/tests/assets/cheetah_coco/annotations/instances_val.json deleted file mode 100644 index cfc65482..00000000 --- a/tests/assets/cheetah_coco/annotations/instances_val.json +++ /dev/null @@ -1 +0,0 @@ -{"licenses": [{"name": "", "id": 0, "url": ""}], "info": {"contributor": "", "date_created": "", "description": "", "url": "", "version": "", "year": ""}, "categories": [{"id": 1, "name": "person", "supercategory": ""}, {"id": 2, "name": "cheetah", "supercategory": ""}], "images": [{"id": 1, "width": 500, "height": 354, "file_name": "cheetah_person.jpg", "license": 0, "flickr_url": "", "coco_url": "", "date_captured": 0}], "annotations": [{"id": 1, "image_id": 1, "category_id": 1, "segmentation": [], "area": 30560.0, "bbox": [274.0, 99.0, 160.0, 191.0], "iscrowd": 0}, {"id": 2, "image_id": 1, "category_id": 2, "segmentation": [], "area": 37281.0, "bbox": [17.0, 160.0, 289.0, 129.0], "iscrowd": 0}, {"id": 3, "image_id": 1, "category_id": 2, "segmentation": [], "area": 16786.0, "bbox": [165.0, 129.0, 109.0, 154.0], "iscrowd": 0}, {"id": 4, "image_id": 1, "category_id": 2, "segmentation": [], "area": 26316.0, "bbox": [316.0, 111.0, 153.0, 172.0], "iscrowd": 0}]} \ No newline at end of file diff --git a/tests/assets/cheetah_voc/VOCdevkit/VOC2012/Annotations/cheetah_person.xml b/tests/assets/cheetah_voc/VOCdevkit/VOC2012/Annotations/cheetah_person.xml new file mode 100644 index 00000000..ed683936 --- /dev/null +++ b/tests/assets/cheetah_voc/VOCdevkit/VOC2012/Annotations/cheetah_person.xml @@ -0,0 +1,63 @@ + + cheetah + cheetah_person.jpg + + Unknown + Unknown + Unknown + + + 500 + 354 + 3 + + 0 + + person + 0 + 0 + 0 + + 274.0 + 99.0 + 434.0 + 290.0 + + + + cheetah + 0 + 0 + 0 + + 17.0 + 160.0 + 306.0 + 289.0 + + + + cheetah + 0 + 0 + 0 + + 165.0 + 129.0 + 274.0 + 283.0 + + + + cheetah + 0 + 0 + 0 + + 316.0 + 111.0 + 469.0 + 283.0 + + + diff --git a/tests/assets/cheetah_voc/VOCdevkit/VOC2012/JPEGImages/cheetah_person.jpg b/tests/assets/cheetah_voc/VOCdevkit/VOC2012/JPEGImages/cheetah_person.jpg new file mode 100644 index 00000000..fe84beeb Binary files /dev/null and b/tests/assets/cheetah_voc/VOCdevkit/VOC2012/JPEGImages/cheetah_person.jpg differ diff --git a/tests/assets/reference_maps/resnet18.a1_in1k_activationmap.npy b/tests/assets/reference_maps/resnet18.a1_in1k_activationmap.npy new file mode 100644 index 00000000..26bc7288 Binary files /dev/null and b/tests/assets/reference_maps/resnet18.a1_in1k_activationmap.npy differ diff --git a/tests/assets/reference_maps/resnet18.a1_in1k_aise.npy b/tests/assets/reference_maps/resnet18.a1_in1k_aise.npy new file mode 100644 index 00000000..42009c43 Binary files /dev/null and b/tests/assets/reference_maps/resnet18.a1_in1k_aise.npy differ diff --git a/tests/assets/reference_maps/resnet18.a1_in1k_reciprocam.npy b/tests/assets/reference_maps/resnet18.a1_in1k_reciprocam.npy new file mode 100644 index 00000000..4d62d178 Binary files /dev/null and b/tests/assets/reference_maps/resnet18.a1_in1k_reciprocam.npy differ diff --git a/tests/assets/reference_maps/resnet18.a1_in1k_rise.npy b/tests/assets/reference_maps/resnet18.a1_in1k_rise.npy new file mode 100644 index 00000000..619ce752 Binary files /dev/null and b/tests/assets/reference_maps/resnet18.a1_in1k_rise.npy differ diff --git a/tests/conftest.py b/tests/conftest.py index 373ce54e..63771304 100644 --- a/tests/conftest.py +++ b/tests/conftest.py @@ -61,7 +61,7 @@ def fxt_output_root( @pytest.fixture(scope="session") -def fxt_clear_cache(request: pytest.FixtureRequest) -> Path: +def fxt_clear_cache(request: pytest.FixtureRequest) -> bool: """Data root directory path.""" clear_cache = bool(request.config.getoption("--clear-cache")) msg = f"{clear_cache = }" diff --git a/tests/func/test_classification_timm_full.py b/tests/func/test_classification_timm_full.py index 73845a64..f47d6a72 100644 --- a/tests/func/test_classification_timm_full.py +++ b/tests/func/test_classification_timm_full.py @@ -32,6 +32,7 @@ SUPPORTED_BUT_FAILED_BY_BB_MODELS = {} NOT_SUPPORTED_BY_BB_MODELS = { + "convit": "RuntimeError: Couldn't get TorchScript module by tracing.", "repvit": "urllib.error.HTTPError: HTTP Error 404: Not Found", "tf_efficientnet_cc": "torch.onnx.errors.SymbolicValueError: Unsupported: ONNX export of convolution for kernel of unknown shape.", "vit_base_r50_s16_224.orig_in21k": "RuntimeError: Error(s) in loading state_dict for VisionTransformer", diff --git a/tests/func/test_torch_onnx_timm_full.py b/tests/func/test_torch_onnx_timm_full.py new file mode 100644 index 00000000..692666cb --- /dev/null +++ b/tests/func/test_torch_onnx_timm_full.py @@ -0,0 +1,125 @@ +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +import os +import shutil +from pathlib import Path + +import cv2 +import numpy as np +import openvino as ov +import pytest + +from openvino_xai import Task, insert_xai +from openvino_xai.common.utils import logger, softmax + +timm = pytest.importorskip("timm") +torch = pytest.importorskip("torch") +pytest.importorskip("onnx") +onnxruntime = pytest.importorskip("onnxruntime") + + +TEST_MODELS = timm.list_models(pretrained=True) + +SKIPPED_MODELS = { + "repvit": "urllib.error.HTTPError: HTTP Error 404: Not Found", + "tf_efficientnet_cc": "torch.onnx.errors.SymbolicValueError: Unsupported: ONNX export of convolution for kernel of unknown shape.", + "vit_base_r50_s16_224.orig_in21k": "RuntimeError: Error(s) in loading state_dict for VisionTransformer", + "vit_huge_patch14_224.orig_in21k": "RuntimeError: Error(s) in loading state_dict for VisionTransformer", + "vit_large_patch32_224.orig_in21k": "RuntimeError: Error(s) in loading state_dict for VisionTransformer", +} + + +class TestTorchOnnxTimm: + clear_cache_converted_models = False + clear_cache_hf_models = False + + @pytest.fixture(autouse=True) + def setup(self, fxt_clear_cache): + self.clear_cache_hf_models = fxt_clear_cache + self.clear_cache_converted_models = fxt_clear_cache + + @pytest.mark.parametrize("model_id", TEST_MODELS) + def test_insert_xai(self, model_id, fxt_output_root: Path): + for skipped_model in SKIPPED_MODELS.keys(): + if skipped_model in model_id: + pytest.skip(reason=SKIPPED_MODELS[skipped_model]) + + # Load Torch model from timm + model = timm.create_model(model_id, in_chans=3, pretrained=True) + input_size = model.default_cfg["input_size"][1:] # (H, W) + input_mean = np.array(model.default_cfg["mean"]) + input_std = np.array(model.default_cfg["std"]) + + # Load image + image = cv2.imread("tests/assets/cheetah_person.jpg") + image = cv2.resize(image, dsize=input_size) + image = cv2.cvtColor(image, code=cv2.COLOR_BGR2RGB) + image_norm = ((image / 255.0 - input_mean) / input_std).astype(np.float32) + image_norm = image_norm.transpose((2, 0, 1)) # HxWxC -> CxHxW + image_norm = image_norm[None, :] # CxHxW -> 1xCxHxW + + # Insert XAI head + model_xai: torch.nn.Module = insert_xai(model, Task.CLASSIFICATION, input_size=input_size) + + # Torch XAI model inference + model_xai.eval() + with torch.no_grad(): + outputs = model_xai(torch.from_numpy(image_norm)) + logits = outputs["prediction"] # BxC + saliency_maps = outputs["saliency_map"] # BxCxhxw + probs = torch.softmax(logits, dim=-1) + label = probs.argmax(dim=-1)[0] + assert probs[0, label] > 0 + + # Torch XAI model saliency map + saliency_maps = saliency_maps.numpy(force=True).squeeze(0) # Cxhxw + saliency_map = saliency_maps[label] # hxw mask for the label + assert saliency_map.shape[-1] > 1 and saliency_map.shape[-2] > 1 + assert saliency_map.min() < saliency_map.max() + assert saliency_map.dtype == np.uint8 + + # ONNX model conversion + model_path = fxt_output_root / "func" / "onnx" / "model.onnx" + model_path.parent.mkdir(parents=True, exist_ok=True) + torch.onnx.export( + model_xai, + torch.from_numpy(image_norm), + model_path, + input_names=["input"], + output_names=["prediction", "saliency_map"], + ) + assert model_path.exists() + + # ONNX model inference + session = onnxruntime.InferenceSession(model_path) + outputs = session.run( + output_names=["prediction", "saliency_map"], + input_feed={"input": image_norm.astype(np.float32)}, + ) + logits, saliency_maps = outputs # NOTE: dict keys are removed in Torch->ONNX conversion + probs = softmax(logits) + label = probs.argmax(axis=-1)[0] + assert probs[0, label] > 0 + + # ONNX XAI model saliency map + saliency_maps = saliency_maps.squeeze(0) # Cxhxw + saliency_map = saliency_maps[label] # hxw mask for the label + assert saliency_map.shape[-1] > 1 and saliency_map.shape[-2] > 1 + assert saliency_map.min() < saliency_map.max() + assert saliency_map.dtype == np.uint8 + + # Clean up + model_path.unlink() + self.clear_cache() + + def clear_cache(self): + if self.clear_cache_converted_models: + ir_model_dir = self.output_dir / "timm_models" / "converted_models" + if ir_model_dir.is_dir(): + shutil.rmtree(ir_model_dir) + if self.clear_cache_hf_models: + cache_dir = os.environ.get("XDG_CACHE_HOME", "~/.cache") + huggingface_hub_dir = Path(cache_dir) / "huggingface/hub/" + if huggingface_hub_dir.is_dir(): + shutil.rmtree(huggingface_hub_dir) diff --git a/tests/intg/test_accuracy_metrics.py b/tests/intg/test_accuracy_metrics.py index 8a98c993..61616cf0 100644 --- a/tests/intg/test_accuracy_metrics.py +++ b/tests/intg/test_accuracy_metrics.py @@ -113,7 +113,7 @@ def test_explainer_image_2_classes(self): assert np.abs(delta_auc_score - 0.39) <= 0.01 adcc_score = self.adcc.evaluate([explanation], [self.image])["adcc"] - assert np.abs(adcc_score - 0.55) <= 0.01 + assert np.abs(adcc_score - 0.77) <= 0.01 def test_explainer_images(self): images = [self.image, self.image] diff --git a/tests/intg/test_classification_timm.py b/tests/intg/test_classification_timm.py index 5ce553b6..aa236b63 100644 --- a/tests/intg/test_classification_timm.py +++ b/tests/intg/test_classification_timm.py @@ -4,6 +4,7 @@ import csv import os import shutil +import subprocess # nosec B404 (not a part of product) from pathlib import Path import cv2 @@ -143,6 +144,12 @@ class TestImageClassificationTimm: 21843: 2441, # 2441 is a cheetah class_id in the ImageNet-21k dataset 11821: 1652, # 1652 is a cheetah class_id in the ImageNet-12k dataset } + reference_maps_names = { + (ExplainMode.WHITEBOX, Method.RECIPROCAM): Path("resnet18.a1_in1k_reciprocam.npy"), + (ExplainMode.WHITEBOX, Method.ACTIVATIONMAP): Path("resnet18.a1_in1k_activationmap.npy"), + (ExplainMode.BLACKBOX, Method.AISE): Path("resnet18.a1_in1k_aise.npy"), + (ExplainMode.BLACKBOX, Method.RISE): Path("resnet18.a1_in1k_rise.npy"), + } @pytest.fixture(autouse=True) def setup(self, fxt_data_root, fxt_output_root, fxt_clear_cache): @@ -479,7 +486,7 @@ def test_torch_insert_xai_with_layer(self, model_id: str, detect: str): image_norm = image_norm[None, :] # CHW -> 1CHW target_class = self.supported_num_classes[model_cfg["num_classes"]] - xai_model: torch.nn.Module = insert_xai( + model_xai: torch.nn.Module = insert_xai( model, task=Task.CLASSIFICATION, target_layer=target_layer, @@ -487,15 +494,15 @@ def test_torch_insert_xai_with_layer(self, model_id: str, detect: str): ) with torch.no_grad(): - xai_model.eval() - xai_output = xai_model(torch.from_numpy(image_norm).float()) - xai_logit = xai_output["prediction"] - xai_prob = torch.softmax(xai_logit, dim=-1) - xai_label = xai_prob.argmax(dim=-1)[0] - assert xai_label.item() == target_class - assert xai_prob[0, xai_label].item() > 0.0 - - saliency_map: np.ndarray = xai_output["saliency_map"].numpy(force=True) + model_xai.eval() + outputs = model_xai(torch.from_numpy(image_norm).float()) + logits = outputs["prediction"] + probs = torch.softmax(logits, dim=-1) + label = probs.argmax(dim=-1)[0] + assert label.item() == target_class + assert probs[0, label].item() > 0.0 + + saliency_map: np.ndarray = outputs["saliency_map"].numpy(force=True) saliency_map = saliency_map.squeeze(0) assert saliency_map.shape[-1] > 1 and saliency_map.shape[-2] > 1 assert saliency_map.min() < saliency_map.max() @@ -503,6 +510,62 @@ def test_torch_insert_xai_with_layer(self, model_id: str, detect: str): self.clear_cache() + @pytest.mark.parametrize( + "explain_mode, explain_method", + [ + (ExplainMode.WHITEBOX, Method.RECIPROCAM), + (ExplainMode.WHITEBOX, Method.ACTIVATIONMAP), + (ExplainMode.BLACKBOX, Method.AISE), + (ExplainMode.BLACKBOX, Method.RISE), + ], + ) + def test_reference_map(self, explain_mode, explain_method): + model_id = "resnet18.a1_in1k" + model_dir = self.data_dir / "timm_models" / "converted_models" + _, model_cfg = self.get_timm_model(model_id, model_dir) + + ir_path = model_dir / model_id / "model_fp32.xml" + model = ov.Core().read_model(ir_path) + + mean_values = [(item * 255) for item in model_cfg["mean"]] + scale_values = [(item * 255) for item in model_cfg["std"]] + preprocess_fn = get_preprocess_fn( + change_channel_order=True, + input_size=model_cfg["input_size"][1:], + mean=mean_values, + std=scale_values, + hwc_to_chw=True, + ) + + explainer = Explainer( + model=model, + task=Task.CLASSIFICATION, + preprocess_fn=preprocess_fn, + postprocess_fn=get_postprocess_fn(), + explain_mode=explain_mode, + explain_method=explain_method, + embed_scaling=False, + ) + + target_class = self.supported_num_classes[model_cfg["num_classes"]] + image = cv2.imread("tests/assets/cheetah_person.jpg") + explanation = explainer( + image, + original_input_image=image, + targets=[target_class], + resize=False, + colormap=False, + ) + + if explain_method == Method.ACTIVATIONMAP: + generated_map = explanation.saliency_map["per_image_map"] + else: + generated_map = explanation.saliency_map[target_class] + + reference_maps_path = Path("tests/assets/reference_maps") + reference_map = np.load(reference_maps_path / self.reference_maps_names[(explain_mode, explain_method)]) + assert np.all(np.abs(generated_map.astype(np.int16) - reference_map.astype(np.int16)) <= 3) + def check_for_saved_map(self, model_id, directory): for target in self.supported_num_classes.values(): map_name = model_id + "_target_" + str(target) + ".jpg" @@ -605,3 +668,22 @@ def count(bool_string): if bool_string == "False": return 0 raise ValueError + + +class TestExample: + """Test sanity of examples/run_torch_onnx.py.""" + + @pytest.fixture(autouse=True) + def setup(self, fxt_data_root): + self.data_dir = fxt_data_root + + def test_torch_onnx(self, tmp_path_factory: pytest.TempPathFactory): + output_root = tmp_path_factory.mktemp("openvino_xai") + output_dir = Path(output_root) / "example" + cmd = [ + "python", + "examples/run_torch_onnx.py", + "--output_dir", + output_dir, + ] + subprocess.run(cmd, check=True) # noqa: S603, PLW1510 diff --git a/tests/perf/conftest.py b/tests/perf/conftest.py index d5cbd4e4..c852255f 100644 --- a/tests/perf/conftest.py +++ b/tests/perf/conftest.py @@ -27,10 +27,23 @@ def pytest_addoption(parser: pytest.Parser): "Defaults to 10.", ) parser.addoption( - "--num-masks", + "--preset", action="store", - default=5000, - help="Number of masks for black box methods." "Defaults to 5000.", + default="speed", + choices=("speed", "balance", "quality"), + help="Efficiency preset for blackbox methods. Defaults to 'speed'.", + ) + parser.addoption( + "--dataset-root", + action="store", + default="", + help="Path to directory with dataset images.", + ) + parser.addoption( + "--dataset-ann-path", + action="store", + default="", + help="Path to dataset annotation file", ) @@ -45,13 +58,13 @@ def fxt_num_repeat(request: pytest.FixtureRequest) -> int: @pytest.fixture(scope="session") -def fxt_num_masks(request: pytest.FixtureRequest) -> int: - """Number of masks for black box methods.""" - num_masks = int(request.config.getoption("--num-masks")) - msg = f"{num_masks = }" +def fxt_preset(request: pytest.FixtureRequest) -> str: + """Efficiency preset for black box methods.""" + preset = request.config.getoption("--preset") + msg = f"{preset = }" log.info(msg) print(msg) - return num_masks + return preset @pytest.fixture(scope="session") @@ -136,6 +149,7 @@ def fxt_perf_summary( "Method.RECIPROCAM": "RECIPROCAM", "Method.VITRECIPROCAM": "RECIPROCAM", "Method.RISE": "RISE", + "Method.AISE": "AISE", } ) raw_data.to_csv(fxt_output_root / "perf-raw-all.csv", index=False) @@ -173,3 +187,15 @@ def fxt_perf_summary( data.to_csv(fxt_output_root / "perf-summary.csv") data.to_excel(fxt_output_root / "perf-summary.xlsx") print(f" -> Saved to {fxt_output_root}") + + +@pytest.fixture(scope="session") +def fxt_dataset_parameters(request: pytest.FixtureRequest) -> tuple[Path | None, Path | None]: + """Retrieve dataset parameters for tests.""" + data_root = request.config.getoption("--dataset-root") + ann_path = request.config.getoption("--dataset-ann-path") + + if data_root != "": + return (Path(data_root), Path(ann_path) if ann_path else None) + else: + return (None, None) diff --git a/tests/perf/perf_tests_utils.py b/tests/perf/perf_tests_utils.py new file mode 100644 index 00000000..86464cd6 --- /dev/null +++ b/tests/perf/perf_tests_utils.py @@ -0,0 +1,72 @@ +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +import shutil +from pathlib import Path +from typing import Dict + +import pytest + +from openvino_xai.utils.model_export import export_to_ir, export_to_onnx + +timm = pytest.importorskip("timm") +torch = pytest.importorskip("torch") + + +from tests.intg.test_classification_timm import ( + LIMITED_DIVERSE_SET_OF_VISION_TRANSFORMER_MODELS, +) + + +def seed_everything(seed: int): + """Set random seed.""" + import os + import random + + import numpy as np + + random.seed(seed) + os.environ["PYTHONHASHSEED"] = str(seed) + np.random.seed(seed) + + +def convert_timm_to_ir(model_id: str, data_dir: Path, supported_num_classes: Dict[int, int]): + timm_model, model_cfg = get_timm_model(model_id, supported_num_classes) + + ir_path = data_dir / "timm_models" / "converted_models" / model_id / "model_fp32.xml" + if not ir_path.is_file(): + output_model_dir = data_dir / "timm_models" / "converted_models" / model_id + output_model_dir.mkdir(parents=True, exist_ok=True) + ir_path = output_model_dir / "model_fp32.xml" + input_size = [1] + list(model_cfg["input_size"]) + dummy_tensor = torch.rand(input_size) + onnx_path = output_model_dir / "model_fp32.onnx" + set_dynamic_batch = model_id in LIMITED_DIVERSE_SET_OF_VISION_TRANSFORMER_MODELS + export_to_onnx(timm_model, onnx_path, dummy_tensor, set_dynamic_batch) + export_to_ir(onnx_path, output_model_dir / "model_fp32.xml") + + return timm_model, model_cfg + + +def get_timm_model(model_id: str, supported_num_classes: Dict[int, int]): + timm_model = timm.create_model(model_id, in_chans=3, pretrained=True, checkpoint_path="") + timm_model.eval() + model_cfg = timm_model.default_cfg + num_classes = model_cfg["num_classes"] + if num_classes not in supported_num_classes: + clear_cache() + pytest.skip(f"Number of model classes {num_classes} unknown") + return timm_model, model_cfg + + +def clear_cache( + data_dir: Path, cache_dir: Path, clear_cache_converted_models: bool = False, clear_cache_hf_models: bool = False +): + if clear_cache_converted_models: + ir_model_dir = data_dir / "timm_models" / "converted_models" + if ir_model_dir.is_dir(): + shutil.rmtree(ir_model_dir) + if clear_cache_hf_models: + huggingface_hub_dir = cache_dir / "huggingface" / "hub" + if huggingface_hub_dir.is_dir(): + shutil.rmtree(huggingface_hub_dir) diff --git a/tests/perf/test_accuracy.py b/tests/perf/test_accuracy.py index 64a60966..769af1e7 100644 --- a/tests/perf/test_accuracy.py +++ b/tests/perf/test_accuracy.py @@ -2,142 +2,254 @@ # SPDX-License-Identifier: Apache-2.0 import os -from enum import Enum -from typing import Any, Dict, List, Tuple +import random +from pathlib import Path +from time import time +from typing import Dict, List, Tuple import numpy as np import openvino as ov +import pandas as pd import pytest +from tqdm import tqdm from openvino_xai import Task +from openvino_xai.common.parameters import ( + BlackBoxXAIMethods, + Method, + Task, + WhiteBoxXAIMethods, +) from openvino_xai.common.utils import retrieve_otx_model from openvino_xai.explainer.explainer import Explainer, ExplainMode +from openvino_xai.explainer.explanation import Explanation from openvino_xai.explainer.utils import ( ActivationType, get_postprocess_fn, get_preprocess_fn, ) +from openvino_xai.methods.black_box.base import Preset from openvino_xai.metrics import ADCC, InsertionDeletionAUC, PointingGame -from tests.unit.explanation.test_explanation_utils import VOC_NAMES +from tests.perf.perf_tests_utils import convert_timm_to_ir +from tests.test_suite.custom_dataset import CustomVOCDetection +from tests.test_suite.dataset_utils import ( + DatasetType, + coco_anns_to_gt_bboxes, + define_dataset_type, + voc_anns_to_gt_bboxes, +) +from tests.unit.explainer.test_explanation_utils import VOC_NAMES, get_imagenet_labels datasets = pytest.importorskip("torchvision.datasets") +timm = pytest.importorskip("timm") +torch = pytest.importorskip("torch") + + +IMAGENET_MODELS = [ + "resnet18.a1_in1k", + # "resnet50.a1_in1k", + # "resnext50_32x4d.a1h_in1k", + # "vgg16.tv_in1k" +] +VOC_MODELS = [ + # "mlc_mobilenetv3_large_voc" +] +TRANSFORMER_MODELS = [ + "deit_tiny_patch16_224.fb_in1k", # Downloads last month 8,377 + # "deit_base_patch16_224.fb_in1k", # Downloads last month 6,323 + # "vit_tiny_patch16_224.augreg_in21k", # Downloads last month 3,671 - trained on ImageNet-21k + # "vit_base_patch16_224.augreg2_in21k_ft_in1k", # Downloads last month 207,590 - trained on ImageNet-21k +] + +TEST_MODELS = IMAGENET_MODELS + VOC_MODELS + TRANSFORMER_MODELS +EXPLAIN_METHODS = [Method.RECIPROCAM, Method.AISE, Method.RISE, Method.ACTIVATIONMAP] -class DatasetType(Enum): - COCO = "coco" - VOC = "voc" - - -def coco_anns_to_gt_bboxes( - anns: List[Dict[str, Any]] | Dict[str, Any], coco_val_labels: Dict[int, str] -) -> Dict[str, List[Tuple[int, int, int, int]]]: - gt_bboxes = {} - for ann in anns: - category_id = ann["category_id"] - category_name = coco_val_labels[category_id] - bbox = ann["bbox"] - if category_name not in gt_bboxes: - gt_bboxes[category_name] = [] - gt_bboxes[category_name].append(bbox) - return gt_bboxes - - -def voc_anns_to_gt_bboxes( - anns: List[Dict[str, Any]] | Dict[str, Any], *args: Any -) -> Dict[str, List[Tuple[int, int, int, int]]]: - gt_bboxes = {} - anns = anns["annotation"]["object"] - for ann in anns: - category_name = ann["name"] - bndbox = list(map(float, ann["bndbox"].values())) - bndbox = np.array(bndbox, dtype=np.int32) - x_min, y_min, x_max, y_max = bndbox - bbox = (x_min, y_min, x_max - x_min, y_max - y_min) - - if category_name not in gt_bboxes: - gt_bboxes[category_name] = [] - gt_bboxes[category_name].append(bbox) - return gt_bboxes - - -def define_dataset_type(data_root: str, ann_path: str) -> DatasetType: - if data_root and ann_path and ann_path.lower().endswith(".json"): - if any(image_name.endswith(".jpg") for image_name in os.listdir(data_root)): - return DatasetType.COCO - - required_voc_dirs = {"JPEGImages", "SegmentationObject", "ImageSets", "Annotations", "SegmentationClass"} - for _, dir, _ in os.walk(data_root): - if required_voc_dirs.issubset(set(dir)): - return DatasetType.VOC - - raise ValueError("Dataset type is not supported") - - -@pytest.mark.parametrize( - "data_root, ann_path", - [ - ("tests/assets/cheetah_coco/images/val", "tests/assets/cheetah_coco/annotations/instances_val.json"), - ("tests/assets/cheetah_voc", None), - ], -) class TestAccuracy: - MODEL_NAME = "mlc_mobilenetv3_large_voc" - - @pytest.fixture(autouse=True) - def setup(self, fxt_data_root, data_root, ann_path): - data_dir = fxt_data_root - retrieve_otx_model(data_dir, self.MODEL_NAME) - model_path = data_dir / "otx_models" / (self.MODEL_NAME + ".xml") - model = ov.Core().read_model(model_path) - - self.setup_dataset(data_root, ann_path) - - self.preprocess_fn = get_preprocess_fn( - change_channel_order=self.channel_format == "BGR", - input_size=(224, 224), - hwc_to_chw=True, - ) - self.postprocess_fn = get_postprocess_fn(activation=ActivationType.SIGMOID) - - self.explainer = Explainer( - model=model, - task=Task.CLASSIFICATION, - preprocess_fn=self.preprocess_fn, - explain_mode=ExplainMode.WHITEBOX, - ) - - self.pointing_game = PointingGame() - self.auc = InsertionDeletionAUC(model, self.preprocess_fn, self.postprocess_fn) - self.adcc = ADCC(model, self.preprocess_fn, self.postprocess_fn, self.explainer) + def setup_dataset(self, dataset_parameters: List[Tuple[Path, Path | None]]): + if dataset_parameters == (None, None): + data_root, ann_path = Path("tests/assets/cheetah_voc"), None + else: + data_root, ann_path = dataset_parameters - def setup_dataset(self, data_root: str, ann_path: str): self.dataset_type = define_dataset_type(data_root, ann_path) - self.channel_format = "RGB" if self.dataset_type in [DatasetType.VOC, DatasetType.COCO] else "None" - if self.dataset_type == DatasetType.COCO: self.dataset = datasets.CocoDetection(root=data_root, annFile=ann_path) self.dataset_labels_dict = {cats["id"]: cats["name"] for cats in self.dataset.coco.cats.values()} self.anns_to_gt_bboxes = coco_anns_to_gt_bboxes - elif self.dataset_type == DatasetType.VOC: - self.dataset = datasets.VOCDetection(root=data_root, download=False, year="2012", image_set="val") + elif self.dataset_type in [DatasetType.VOC, DatasetType.ILSVRC]: + self.dataset = CustomVOCDetection(root=data_root, download=False, year="2012", image_set="val") self.dataset_labels_dict = None self.anns_to_gt_bboxes = voc_anns_to_gt_bboxes + self.dataset = self.subset_dataset(num_samples=5000, seed=42) + + def subset_dataset(self, num_samples=-1, seed=42): + if num_samples == -1 or num_samples >= len(self.dataset): + return self.dataset + random.seed(seed) + subset_indices = random.sample(range(len(self.dataset)), num_samples) + return torch.utils.data.Subset(self.dataset, subset_indices) + + def setup_model(self, data_dir, model_name): + if model_name in VOC_MODELS: + self.dataset_label_list = VOC_NAMES + retrieve_otx_model(data_dir, model_name) + model_path = data_dir / "otx_models" / (model_name + ".xml") + model = ov.Core().read_model(model_path) + return model, None + + elif model_name in IMAGENET_MODELS + TRANSFORMER_MODELS: + _, model_cfg = convert_timm_to_ir(model_name, data_dir, self.supported_num_classes) + version = "1k" if model_cfg["num_classes"] == 1000 else "21k" + self.dataset_label_list = get_imagenet_labels(version) + ir_path = data_dir / "timm_models" / "converted_models" / model_name / "model_fp32.xml" + model = ov.Core().read_model(ir_path) + return model, model_cfg + else: + raise ValueError(f"Model {model_name} is not supported since it's not VOC or ImageNet model.") + + def setup_process_fn(self, model_cfg): + if self.model_name in VOC_MODELS: + # VOC model + self.preprocess_fn = get_preprocess_fn( + change_channel_order=False, + input_size=(224, 224), + hwc_to_chw=True, + ) + self.postprocess_fn = get_postprocess_fn(activation=ActivationType.SIGMOID) + elif self.model_name in IMAGENET_MODELS + TRANSFORMER_MODELS: + # Timm ImageNet model + mean_values = [(item * 255) for item in model_cfg["mean"]] + scale_values = [(item * 255) for item in model_cfg["std"]] + self.preprocess_fn = get_preprocess_fn( + change_channel_order=True, + input_size=model_cfg["input_size"][1:], + mean=mean_values, + std=scale_values, + hwc_to_chw=True, + ) + self.postprocess_fn = get_postprocess_fn(activation=ActivationType.SOFTMAX) + else: + raise ValueError(f"Model {self.model_name} is not supported since it's not VOC or ImageNet model.") + + def setup_explainer(self, model, explain_method): + explain_mode = ExplainMode.WHITEBOX if explain_method in WhiteBoxXAIMethods else ExplainMode.BLACKBOX + + if self.model_name in TRANSFORMER_MODELS and explain_method == Method.RECIPROCAM: + explain_method = Method.VITRECIPROCAM - def test_explainer_images(self): - images, explanations, dataset_gt_bboxes = [], [], [] - for image, anns in self.dataset: - image_np = np.array(image) - gt_bbox_dict = self.anns_to_gt_bboxes(anns, self.dataset_labels_dict) - targets = [target for target in gt_bbox_dict.keys() if target in VOC_NAMES] + self.explainer = Explainer( + model=model, + task=Task.CLASSIFICATION, + preprocess_fn=self.preprocess_fn, + postprocess_fn=self.postprocess_fn, + explain_mode=explain_mode, + explain_method=explain_method, + embed_scaling=True, + ) + kwargs = {} + if explain_method in BlackBoxXAIMethods: + # TODO: Make Preset configurable as well + kwargs.update({"preset": Preset.SPEED}) + return kwargs + + @pytest.fixture(autouse=True) + def setup(self, fxt_data_root, fxt_output_root, fxt_dataset_parameters): + self.data_dir = fxt_data_root + self.output_dir = fxt_output_root + self.supported_num_classes = {1000: 1000, 21841: 21841, 21843: 21843} - explanation = self.explainer(image_np, targets=targets, label_names=VOC_NAMES, colormap=False) + self.setup_dataset(fxt_dataset_parameters) + self.dataset_name = self.dataset_type.value - images.append(image_np) - explanations.append(explanation) - dataset_gt_bboxes.append({key: value for key, value in gt_bbox_dict.items() if key in targets}) + @pytest.mark.parametrize("model_id", TEST_MODELS) + @pytest.mark.parametrize("explain_method", EXPLAIN_METHODS) + def test_explainer_images(self, model_id, explain_method): + self.model_name = model_id + self.data_metric_path = self.output_dir / self.model_name / explain_method.value + os.makedirs(self.data_metric_path, exist_ok=True) - pointing_game = self.pointing_game.evaluate(explanations, dataset_gt_bboxes) - auc = self.auc.evaluate(explanations, images, steps=10) - adcc = self.adcc.evaluate(explanations, images) + model, model_cfg = self.setup_model(self.data_dir, self.model_name) + self.setup_process_fn(model_cfg) + black_box_kwargs = self.setup_explainer(model, explain_method) - return {**pointing_game, **auc, **adcc} + self.pointing_game = PointingGame() + self.auc = InsertionDeletionAUC(model, self.preprocess_fn, self.postprocess_fn) + self.adcc = ADCC(model, self.preprocess_fn, self.postprocess_fn, self.explainer, **black_box_kwargs) + + records = [] + explained_images = 0 + experiment_start_time = time() + batch_size = 1000 + + for lrange in tqdm(range(0, batch_size), desc="Processing batches"): + rrange = min(len(self.dataset), lrange + batch_size) + + start_time = time() + images, explanations, dataset_gt_bboxes = [], [], [] + for i in range(lrange, rrange): + image, anns = self.dataset[i] + image_np = np.array(image) # PIL -> np.array + gt_bbox_dict = self.anns_to_gt_bboxes(anns, self.dataset_labels_dict) + + # To measure the quality of predicted saliency maps without the gt info from the dataset (found out how to check it) + # targets = np.argmax(self.model_predict(image_np)) + targets = list(gt_bbox_dict.keys()) + intersected_targets = list(set(targets) & set(self.dataset_label_list)) + if len(intersected_targets) == 0: + # Skip images where gt classes and model classes do not match + continue + explanation = self.explainer( + image_np, + targets=intersected_targets, + label_names=self.dataset_label_list, + colormap=False, + **black_box_kwargs, + ) + images.append(image_np) + explanations.append(explanation) + dataset_gt_bboxes.append(gt_bbox_dict) + + # Write per-batch statistics to track failures + explained_images += len(explanations) + record = {"range": f"{lrange}-{rrange}"} + record.update(self.get_xai_metrics(explanations, images, dataset_gt_bboxes, start_time)) + records.append(record) + + df = pd.DataFrame([record]).round(3) + df.to_csv(self.data_metric_path / f"accuracy_{self.dataset_name}.csv", mode="a", header=False, index=False) + + experiment_time = time() - experiment_start_time + mean_scores_dict = {"explained_images": explained_images, "overall_time": experiment_time} + mean_scores_dict.update( + { + key: np.mean([record[key] for record in records if key in record]) + for key in records[0].keys() + if key != "range" + } + ) + df = pd.DataFrame([mean_scores_dict]).round(3) + df.to_csv(self.data_metric_path / f"mean_accuracy_{self.dataset_name}.csv", index=False) + + def get_xai_metrics( + self, + explanations: list[Explanation], + images: list[np.ndarray], + dataset_gt_bboxes: Dict[str, List[Tuple[int, int, int, int]]], + start_time: float, + ): + score = {} + if len(explanations) == 0: + return score + + def evaluate_metric_time(metric_name, evaluation_func, *args, **kwargs): + previous_time = time() + score.update(evaluation_func(*args, **kwargs)) + score[f"{metric_name}_time"] = time() - previous_time + + score["explain_time"] = time() - start_time + evaluate_metric_time("pointing_game", self.pointing_game.evaluate, explanations, dataset_gt_bboxes) + evaluate_metric_time("auc", self.auc.evaluate, explanations, images, steps=30) + evaluate_metric_time("adcc", self.adcc.evaluate, explanations, images) + + return score diff --git a/tests/perf/test_performance.py b/tests/perf/test_efficiency.py similarity index 69% rename from tests/perf/test_performance.py rename to tests/perf/test_efficiency.py index 504a7310..9d9da02c 100644 --- a/tests/perf/test_performance.py +++ b/tests/perf/test_efficiency.py @@ -1,28 +1,27 @@ # Copyright (C) 2024 Intel Corporation # SPDX-License-Identifier: Apache-2.0 -import csv import os import shutil from pathlib import Path from time import time import cv2 -import numpy as np import openvino as ov import pandas as pd import pytest from openvino_xai.common.parameters import Method, Task from openvino_xai.explainer.explainer import Explainer, ExplainMode -from openvino_xai.explainer.utils import ( - ActivationType, - get_postprocess_fn, - get_preprocess_fn, - get_score, +from openvino_xai.explainer.utils import get_postprocess_fn, get_preprocess_fn +from openvino_xai.methods.black_box.base import Preset +from openvino_xai.utils.model_export import export_to_onnx +from tests.perf.perf_tests_utils import ( + clear_cache, + convert_timm_to_ir, + get_timm_model, + seed_everything, ) -from openvino_xai.explainer.visualizer import Visualizer -from openvino_xai.utils.model_export import export_to_ir, export_to_onnx timm = pytest.importorskip("timm") torch = pytest.importorskip("torch") @@ -40,19 +39,7 @@ ) -def seed_everything(seed: int): - """Set random seed.""" - import os - import random - - import numpy as np - - random.seed(seed) - os.environ["PYTHONHASHSEED"] = str(seed) - np.random.seed(seed) - - -class TestPerfClassificationTimm: +class TestEfficiency: clear_cache_converted_models = False clear_cache_hf_models = False supported_num_classes = { @@ -75,19 +62,8 @@ def test_classification_white_box(self, model_id: str, fxt_num_repeat: int, fxt_ if model_id in NON_SUPPORTED_BY_WB_MODELS: pytest.skip(reason="Not supported yet") - timm_model, model_cfg = self.get_timm_model(model_id) - + _, model_cfg = convert_timm_to_ir(model_id, self.data_dir, self.supported_num_classes) ir_path = self.data_dir / "timm_models" / "converted_models" / model_id / "model_fp32.xml" - if not ir_path.is_file(): - output_model_dir = self.output_dir / "timm_models" / "converted_models" / model_id - output_model_dir.mkdir(parents=True, exist_ok=True) - ir_path = output_model_dir / "model_fp32.xml" - input_size = [1] + list(timm_model.default_cfg["input_size"]) - dummy_tensor = torch.rand(input_size) - onnx_path = output_model_dir / "model_fp32.onnx" - set_dynamic_batch = model_id in LIMITED_DIVERSE_SET_OF_VISION_TRANSFORMER_MODELS - export_to_onnx(timm_model, onnx_path, dummy_tensor, set_dynamic_batch) - export_to_ir(onnx_path, output_model_dir / "model_fp32.xml") if model_id in LIMITED_DIVERSE_SET_OF_CNN_MODELS: explain_method = Method.RECIPROCAM @@ -147,13 +123,16 @@ def test_classification_white_box(self, model_id: str, fxt_num_repeat: int, fxt_ records.append(record) df = pd.DataFrame(records) - df.to_csv(self.output_dir / f"perf-raw-wb-{model_id}.csv") + df.to_csv(self.output_dir / f"perf-raw-wb-{model_id}-{explain_method}.csv") - self.clear_cache() + clear_cache(self.data_dir, self.cache_dir, self.clear_cache_converted_models, self.clear_cache_hf_models) @pytest.mark.parametrize("model_id", TEST_MODELS) - def test_classification_black_box(self, model_id, fxt_num_repeat: int, fxt_num_masks: int, fxt_tags: dict): - timm_model, model_cfg = self.get_timm_model(model_id) + @pytest.mark.parametrize("method", [Method.AISE, Method.RISE]) + def test_classification_black_box( + self, model_id: str, method: Method, fxt_num_repeat: int, fxt_preset: str, fxt_tags: dict + ): + timm_model, model_cfg = get_timm_model(model_id, self.supported_num_classes) onnx_path = self.data_dir / "timm_models" / "converted_models" / model_id / "model_fp32.onnx" if not onnx_path.is_file(): @@ -188,9 +167,9 @@ def test_classification_black_box(self, model_id, fxt_num_repeat: int, fxt_num_m record = fxt_tags.copy() record["model"] = model_id - record["method"] = Method.RISE + record["method"] = method record["seed"] = seed - record["num_masks"] = fxt_num_masks + record["preset"] = fxt_preset start_time = time() @@ -200,6 +179,7 @@ def test_classification_black_box(self, model_id, fxt_num_repeat: int, fxt_num_m preprocess_fn=preprocess_fn, postprocess_fn=postprocess_fn, explain_mode=ExplainMode.BLACKBOX, # defaults to AUTO + explain_method=method, # defaults to AISE ) explanation = explainer( image, @@ -207,7 +187,7 @@ def test_classification_black_box(self, model_id, fxt_num_repeat: int, fxt_num_m resize=True, colormap=True, overlay=True, - num_masks=fxt_num_masks, # kwargs of the RISE algo + preset=Preset(fxt_preset), # kwargs of the black box algo ) explain_time = time() - start_time @@ -219,26 +199,6 @@ def test_classification_black_box(self, model_id, fxt_num_repeat: int, fxt_num_m records.append(record) df = pd.DataFrame(records) - df.to_csv(self.output_dir / f"perf-raw-bb-{model_id}.csv", index=False) - - self.clear_cache() - - def get_timm_model(self, model_id): - timm_model = timm.create_model(model_id, in_chans=3, pretrained=True, checkpoint_path="") - timm_model.eval() - model_cfg = timm_model.default_cfg - num_classes = model_cfg["num_classes"] - if num_classes not in self.supported_num_classes: - self.clear_cache() - pytest.skip(f"Number of model classes {num_classes} unknown") - return timm_model, model_cfg - - def clear_cache(self): - if self.clear_cache_converted_models: - ir_model_dir = self.data_dir / "timm_models" / "converted_models" - if ir_model_dir.is_dir(): - shutil.rmtree(ir_model_dir) - if self.clear_cache_hf_models: - huggingface_hub_dir = self.cache_dir / "huggingface" / "hub" - if huggingface_hub_dir.is_dir(): - shutil.rmtree(huggingface_hub_dir) + df.to_csv(self.output_dir / f"perf-raw-bb-{model_id}-{method}.csv", index=False) + + clear_cache(self.data_dir, self.cache_dir, self.clear_cache_converted_models, self.clear_cache_hf_models) diff --git a/tests/test_suite/custom_dataset.py b/tests/test_suite/custom_dataset.py new file mode 100644 index 00000000..a1f6d270 --- /dev/null +++ b/tests/test_suite/custom_dataset.py @@ -0,0 +1,31 @@ +import os + +from torchvision import datasets + + +class CustomVOCDetection(datasets.VOCDetection): + _TARGET_DIR = "Annotations" + _TARGET_FILE_EXT = ".xml" + + def __init__(self, root, download=False, year="2012", image_set="val"): + # Call the parent class's __init__ method + try: + self._SPLITS_DIR = "Main" + super(CustomVOCDetection, self).__init__(root, year=year, image_set=image_set, download=download) + except Exception: + self._SPLITS_DIR = "CLS-LOC" + voc_root = root + self.image_set = image_set + + splits_dir = os.path.join(voc_root, "ImageSets", self._SPLITS_DIR) + split_f = os.path.join(splits_dir, image_set.rstrip("\n") + ".txt") + with open(os.path.join(split_f)) as f: + file_names = [x.split()[0] for x in f.readlines()] + + image_dir = os.path.join(voc_root, "Data", self._SPLITS_DIR, self.image_set) + self.images = [os.path.join(image_dir, x + ".JPEG") for x in file_names] + + target_dir = os.path.join(voc_root, self._TARGET_DIR, self._SPLITS_DIR, self.image_set) + self.targets = [os.path.join(target_dir, x + self._TARGET_FILE_EXT) for x in file_names] + + assert len(self.images) == len(self.targets) diff --git a/tests/test_suite/dataset_utils.py b/tests/test_suite/dataset_utils.py new file mode 100644 index 00000000..28280b06 --- /dev/null +++ b/tests/test_suite/dataset_utils.py @@ -0,0 +1,63 @@ +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +import os +from enum import Enum +from pathlib import Path +from typing import Any, Dict, List, Tuple + +import numpy as np + + +class DatasetType(Enum): + COCO = "COCO" + ILSVRC = "ILSVRC" + VOC = "VOC" + + +def coco_anns_to_gt_bboxes( + anns: List[Dict[str, Any]] | Dict[str, Any], coco_val_labels: Dict[int, str] +) -> Dict[str, List[Tuple[int, int, int, int]]]: + gt_bboxes = {} + for ann in anns: + category_id = ann["category_id"] + category_name = coco_val_labels[category_id] + bbox = ann["bbox"] + if category_name not in gt_bboxes: + gt_bboxes[category_name] = [] + gt_bboxes[category_name].append(bbox) + return gt_bboxes + + +def voc_anns_to_gt_bboxes( + anns: List[Dict[str, Any]] | Dict[str, Any], *args: Any +) -> Dict[str, List[Tuple[int, int, int, int]]]: + gt_bboxes = {} + anns = anns["annotation"]["object"] + for ann in anns: + category_name = ann["name"] + bndbox = list(map(float, ann["bndbox"].values())) + bndbox = np.array(bndbox, dtype=np.int32) + x_min, y_min, x_max, y_max = bndbox + bbox = (x_min, y_min, x_max - x_min, y_max - y_min) + + if category_name not in gt_bboxes: + gt_bboxes[category_name] = [] + gt_bboxes[category_name].append(bbox) + return gt_bboxes + + +def define_dataset_type(data_root: Path, ann_path: Path) -> DatasetType: + if data_root and ann_path and ann_path.suffix == ".json": + if any(image_name.endswith(".jpg") for image_name in os.listdir(data_root)): + return DatasetType.COCO + + required_voc_dirs = {"JPEGImages", "ImageSets", "Annotations"} + required_ilsvrc_dirs = {"Data", "ImageSets", "Annotations"} + for _, dir, _ in os.walk(data_root): + if required_ilsvrc_dirs.issubset(set(dir)): + return DatasetType.ILSVRC + if required_voc_dirs.issubset(set(dir)): + return DatasetType.VOC + + raise ValueError("Dataset type is not supported") diff --git a/tests/unit/explainer/test_explanation.py b/tests/unit/explainer/test_explanation.py index fed82447..4a49043a 100644 --- a/tests/unit/explainer/test_explanation.py +++ b/tests/unit/explainer/test_explanation.py @@ -12,6 +12,14 @@ from tests.unit.explainer.test_explanation_utils import VOC_NAMES SALIENCY_MAPS = (np.random.rand(1, 20, 5, 5) * 255).astype(np.uint8) +SALIENCY_MAPS_DICT = { + 0: (np.random.rand(5, 5, 3) * 255).astype(np.uint8), + 2: (np.random.rand(5, 5, 3) * 255).astype(np.uint8), +} +SALIENCY_MAPS_DICT_EXCEPTION = { + 0: (np.random.rand(5, 5, 3, 2) * 255).astype(np.uint8), + 2: (np.random.rand(5, 5, 3, 2) * 255).astype(np.uint8), +} SALIENCY_MAPS_IMAGE = (np.random.rand(1, 5, 5) * 255).astype(np.uint8) @@ -106,7 +114,7 @@ def test_plot(self, mocker, caplog): # Update the num columns for the matplotlib visualization grid explanation.plot(backend="matplotlib", num_columns=1) - # Class index that is not in saliency maps will be ommitted with message + # Class index that is not in saliency maps will be omitted with message with caplog.at_level(logging.INFO): explanation.plot([0, 3], backend="matplotlib") assert "Provided class index 3 is not available among saliency maps." in caplog.text @@ -123,3 +131,13 @@ def test_plot(self, mocker, caplog): # Plot activation map explanation = self._get_explanation(saliency_maps=SALIENCY_MAPS_IMAGE, label_names=None) explanation.plot() + + # Plot colored map + explanation = self._get_explanation(saliency_maps=SALIENCY_MAPS_DICT, label_names=None) + explanation.plot() + + # Plot wrong map shape + with pytest.raises(Exception) as exc_info: + explanation = self._get_explanation(saliency_maps=SALIENCY_MAPS_DICT_EXCEPTION, label_names=None) + explanation.plot() + assert str(exc_info.value) == "Saliency map expected to be 3 or 2-dimensional, but got 4." diff --git a/tests/unit/explainer/test_explanation_utils.py b/tests/unit/explainer/test_explanation_utils.py index 6460f98c..8ad594e3 100644 --- a/tests/unit/explainer/test_explanation_utils.py +++ b/tests/unit/explainer/test_explanation_utils.py @@ -1,8 +1,11 @@ # Copyright (C) 2023-2024 Intel Corporation # SPDX-License-Identifier: Apache-2.0 +import os + import numpy as np import pytest +import requests from openvino_xai.common.utils import is_bhwc_layout from openvino_xai.explainer.utils import ActivationType, get_score, get_target_indices @@ -79,3 +82,25 @@ def test_get_score(): def test_is_bhwc_layout(): assert is_bhwc_layout(np.empty((1, 224, 224, 3))) assert is_bhwc_layout(np.empty((1, 3, 224, 224))) == False + + +def get_imagenet_labels(version="1k"): + if version == "1k": + url = "https://storage.openvinotoolkit.org/repositories/openvino_notebooks/data/data/datasets/imagenet/imagenet_2012.txt" + file_path = "imagenet_2012.txt" + elif version == "21k": + url = "https://storage.googleapis.com/bit_models/imagenet21k_wordnet_ids.txt" + file_path = "imagenet21k_wordnet_ids.txt" + + if not os.path.exists(file_path): + response = requests.get(url) + response.raise_for_status() + + with open(file_path, "w") as f: + f.write(response.text) + + with open(file_path, "r") as f: + labels = f.read().splitlines() + + labels = [label.split()[0] for label in labels] + return labels diff --git a/tests/unit/explainer/test_visualization.py b/tests/unit/explainer/test_visualization.py index 9a59fd0a..9e78d54c 100644 --- a/tests/unit/explainer/test_visualization.py +++ b/tests/unit/explainer/test_visualization.py @@ -10,6 +10,11 @@ from openvino_xai.explainer.visualizer import Visualizer, colormap, overlay, resize from openvino_xai.methods.base import Prediction +ORIGINAL_INPUT_IMAGE = [ + np.ones((100, 100, 3)), + np.ones((10, 10, 3)), +] + SALIENCY_MAPS = [ (np.random.rand(1, 5, 5) * 255).astype(np.uint8), (np.random.rand(1, 2, 5, 5) * 255).astype(np.uint8), @@ -97,6 +102,7 @@ def test_overlay(): class TestVisualizer: + @pytest.mark.parametrize("original_input_image", ORIGINAL_INPUT_IMAGE) @pytest.mark.parametrize("saliency_maps", SALIENCY_MAPS) @pytest.mark.parametrize("explain_all_classes", EXPLAIN_ALL_CLASSES) @pytest.mark.parametrize("task", [Task.CLASSIFICATION, Task.DETECTION]) @@ -105,8 +111,10 @@ class TestVisualizer: @pytest.mark.parametrize("colormap", [True, False]) @pytest.mark.parametrize("overlay", [True, False]) @pytest.mark.parametrize("overlay_weight", [0.5, 0.3]) + @pytest.mark.parametrize("overlay_prediction", [True, False]) def test_visualizer( self, + original_input_image, saliency_maps, explain_all_classes, task, @@ -115,6 +123,7 @@ def test_visualizer( colormap, overlay, overlay_weight, + overlay_prediction, ): if explain_all_classes: explain_targets = -1 @@ -124,7 +133,6 @@ def test_visualizer( explanation = Explanation(saliency_maps, targets=explain_targets, task=Task.CLASSIFICATION) raw_sal_map_dims = len(explanation.shape) - original_input_image = np.ones((20, 20, 3)) visualizer = Visualizer() explanation = visualizer( explanation=explanation, @@ -134,6 +142,7 @@ def test_visualizer( colormap=colormap, overlay=overlay, overlay_weight=overlay_weight, + overlay_prediction=overlay_prediction, ) assert explanation is not None @@ -161,6 +170,7 @@ def test_visualizer( colormap=colormap, overlay=overlay, overlay_weight=overlay_weight, + overlay_prediction=overlay_prediction, ) maps_data = explanation.saliency_map maps_size = explanation_output_size.saliency_map @@ -172,14 +182,21 @@ def test_visualizer( 1: Prediction(bounding_box=[2, 5, 9, 7], score=0.5, label=0), } explanation = Explanation(saliency_maps, targets=-1, task=task, predictions=predictions) + visualizer = Visualizer() - explanation_output_size = visualizer( + explanation = visualizer( explanation=explanation, original_input_image=original_input_image, - output_size=(20, 20), scaling=scaling, resize=resize, colormap=colormap, overlay=overlay, overlay_weight=overlay_weight, + overlay_prediction=overlay_prediction, ) + + if task == Task.CLASSIFICATION and original_input_image.shape[0] == 100 and overlay: + if overlay_prediction: + assert np.all(explanation.saliency_map[0][10, 6] == np.array([255, 0, 0], dtype=np.uint8)) + else: + assert np.any(explanation.saliency_map[0][10, 6] != np.array([255, 0, 0], dtype=np.uint8)) diff --git a/tests/unit/methods/black_box/test_black_box_method.py b/tests/unit/methods/black_box/test_black_box_method.py index cd42a539..0cee1c51 100644 --- a/tests/unit/methods/black_box/test_black_box_method.py +++ b/tests/unit/methods/black_box/test_black_box_method.py @@ -104,16 +104,22 @@ def test_preset(self, fxt_data_root: Path): self._generate_with_preset(method, Preset.SPEED) toc = time.time() time_speed = toc - tic + assert method.num_iterations_per_kernel == 20 + assert np.all(method.kernel_widths == np.array([0.1, 0.175, 0.25])) tic = time.time() self._generate_with_preset(method, Preset.BALANCE) toc = time.time() time_balance = toc - tic + assert method.num_iterations_per_kernel == 50 + assert np.all(method.kernel_widths == np.array([0.1, 0.175, 0.25])) tic = time.time() self._generate_with_preset(method, Preset.QUALITY) toc = time.time() time_quality = toc - tic + assert method.num_iterations_per_kernel == 50 + np.testing.assert_allclose(method.kernel_widths, np.array([0.075, 0.11875, 0.1625, 0.20625, 0.25])) assert time_speed < time_balance < time_quality @@ -171,16 +177,22 @@ def test_preset(self, fxt_data_root: Path): self._generate_with_preset(method, Preset.SPEED) toc = time.time() time_speed = toc - tic + assert method.num_iterations_per_kernel == 20 + assert np.all(method.divisors == np.array([7.0, 4.0, 1.0])) tic = time.time() self._generate_with_preset(method, Preset.BALANCE) toc = time.time() time_balance = toc - tic + assert method.num_iterations_per_kernel == 50 + assert np.all(method.divisors == np.array([7.0, 4.0, 1.0])) tic = time.time() self._generate_with_preset(method, Preset.QUALITY) toc = time.time() time_quality = toc - tic + assert method.num_iterations_per_kernel == 50 + assert np.all(method.divisors == np.array([8.0, 6.25, 4.5, 2.75, 1.0])) assert time_speed < time_balance < time_quality @@ -227,16 +239,22 @@ def test_preset(self, fxt_data_root: Path): self._generate_with_preset(method, Preset.SPEED) toc = time.time() time_speed = toc - tic + assert method.num_masks == 1000 + assert method.num_cells == 4 tic = time.time() self._generate_with_preset(method, Preset.BALANCE) toc = time.time() time_balance = toc - tic + assert method.num_masks == 5000 + assert method.num_cells == 8 tic = time.time() self._generate_with_preset(method, Preset.QUALITY) toc = time.time() time_quality = toc - tic + assert method.num_masks == 10_000 + assert method.num_cells == 12 assert time_speed < time_balance < time_quality diff --git a/tests/unit/methods/white_box/test_torch.py b/tests/unit/methods/white_box/test_torch.py index 8bea7d4d..ee487302 100644 --- a/tests/unit/methods/white_box/test_torch.py +++ b/tests/unit/methods/white_box/test_torch.py @@ -40,6 +40,7 @@ def __init__(self, num_classes: int = 2): torch.nn.Identity(), torch.nn.Identity(), torch.nn.Identity(), + torch.nn.LazyConv2d(256, (1, 1)), ) self.neck = torch.nn.AdaptiveAvgPool2d((1, 1)) self.output = torch.nn.LazyLinear(out_features=num_classes) @@ -123,7 +124,6 @@ def _output_hook( assert type(output) == dict prediction = output["prediction"] saliency_maps = output[SALIENCY_MAP_OUTPUT_NAME] - assert np.all(saliency_maps == prediction) def test_prepare_model(): diff --git a/tests/unit/metrics/test_adcc.py b/tests/unit/metrics/test_adcc.py index 7e6dca91..7b562200 100644 --- a/tests/unit/metrics/test_adcc.py +++ b/tests/unit/metrics/test_adcc.py @@ -40,10 +40,6 @@ def setup(self, fxt_data_root): ) self.adcc = ADCC(self.model, self.preprocess_fn, self.postprocess_fn, self.explainer) - def test_adcc_init_wo_explainer(self): - adcc_wo_explainer = ADCC(self.model, self.preprocess_fn, self.postprocess_fn) - assert isinstance(adcc_wo_explainer.explainer, Explainer) - def test_adcc(self): input_image = np.random.randint(0, 256, (224, 224, 3), dtype=np.uint8) saliency_map = np.random.rand(224, 224) @@ -84,3 +80,11 @@ def test_evaluate(self): assert isinstance(adcc_score, float) assert 0 <= adcc_score <= 1 + + # Activation map + explanations = [ + Explanation({"per_image_map": np.random.rand(224, 224)}, targets="per_image_map", task=Task.CLASSIFICATION) + ] + adcc_score = self.adcc.evaluate(explanations, input_images)["adcc"] + assert isinstance(adcc_score, float) + assert 0 <= adcc_score <= 1 diff --git a/tests/unit/metrics/test_auc.py b/tests/unit/metrics/test_auc.py index 4ec1f68a..64c8cc16 100644 --- a/tests/unit/metrics/test_auc.py +++ b/tests/unit/metrics/test_auc.py @@ -64,7 +64,16 @@ def test_evaluate(self): ] insertion, deletion, delta = self.auc.evaluate(explanations, input_images, self.steps).values() + for value in [insertion, deletion]: + assert isinstance(value, float) + assert 0 <= value <= 1 + assert isinstance(delta, float) + # Activation map + explanations = [ + Explanation({"per_image_map": np.random.rand(224, 224)}, targets="per_image_map", task=Task.CLASSIFICATION) + ] + insertion, deletion, delta = self.auc.evaluate(explanations, input_images, self.steps).values() for value in [insertion, deletion]: assert isinstance(value, float) assert 0 <= value <= 1 diff --git a/tests/unit/metrics/test_pointing_game.py b/tests/unit/metrics/test_pointing_game.py index 6592de60..16ae383c 100644 --- a/tests/unit/metrics/test_pointing_game.py +++ b/tests/unit/metrics/test_pointing_game.py @@ -12,6 +12,7 @@ class TestPointingGame: @pytest.fixture(autouse=True) def setUp(self): self.pointing_game = PointingGame() + self.gt_bboxes = [{"cat": [(0, 0, 2, 2)], "dog": [(0, 0, 1, 1)]}] def test_pointing_game(self): saliency_map = np.zeros((3, 3), dtype=np.float32) @@ -27,16 +28,15 @@ def test_pointing_game(self): def test_pointing_game_evaluate(self, caplog): pointing_game = PointingGame() - explanation = Explanation( - label_names=["cat", "dog"], - targets=[0, 1], - task=Task.CLASSIFICATION, - saliency_map={0: [[0, 1], [2, 3]], 1: [[0, 0], [0, 1]]}, - ) - explanations = [explanation] - - gt_bboxes = [{"cat": [(0, 0, 2, 2)], "dog": [(0, 0, 1, 1)]}] - score_result = pointing_game.evaluate(explanations, gt_bboxes) + explanations = [ + Explanation( + label_names=["cat", "dog"], + targets=[0, 1], + task=Task.CLASSIFICATION, + saliency_map={0: [[0, 1], [2, 3]], 1: [[0, 0], [0, 1]]}, + ) + ] + score_result = pointing_game.evaluate(explanations, self.gt_bboxes) assert score_result["pointing_game"] == 1.0 # No hit for dog class saliency map, hit for cat class saliency map @@ -57,13 +57,20 @@ def test_pointing_game_evaluate(self, caplog): score_result = pointing_game.evaluate(explanations, gt_bboxes) # No label names - explanation = Explanation( - label_names=None, - targets=[0, 1], - task=Task.CLASSIFICATION, - saliency_map={0: [[0, 1], [2, 3]], 1: [[0, 0], [0, 1]]}, - ) - explanations = [explanation] - gt_bboxes = [{"cat": [(0, 0, 2, 2)], "dog": [(0, 0, 1, 1)]}] + explanations = [ + Explanation( + label_names=None, + targets=[0, 1], + task=Task.CLASSIFICATION, + saliency_map={0: [[0, 1], [2, 3]], 1: [[0, 0], [0, 1]]}, + ) + ] with pytest.raises(AssertionError): - score_result = pointing_game.evaluate(explanations, gt_bboxes) + score_result = pointing_game.evaluate(explanations, self.gt_bboxes) + + # Activation map + explanations = [ + Explanation({"per_image_map": [[0, 1], [2, 3]]}, targets="per_image_map", task=Task.CLASSIFICATION) + ] + score_result = pointing_game.evaluate(explanations, self.gt_bboxes) + assert score_result["pointing_game"] == 1.0 diff --git a/third-party-programs.txt b/third-party-programs.txt index dd442b4c..a566ce32 100644 --- a/third-party-programs.txt +++ b/third-party-programs.txt @@ -17,7 +17,7 @@ terms are listed below. Software Released under Apache License 2.0: openvino-dev - Copyright (C) 2018-2024 Intel Corporation + Copyright (C) 2018-2024 Intel Corporation, all rights reserved. opencv-python Copyright (C) 2000-2022, Intel Corporation, all rights reserved. Copyright (C) 2009-2011, Willow Garage Inc., all rights reserved. @@ -233,41 +233,96 @@ opencv-python See the License for the specific language governing permissions and limitations under the License. -------------------------------------------------------------------------------------------------------------------------------------------------- -numpy -BSD-3-Clause +------------------------------------------------------------------------------------------------------------------------------------------------------------- +Software Released under BSD-3-Clause: -Copyright (c) 2005-2024, NumPy Developers. -All rights reserved. +numpy + Copyright (c) 2005-2024, NumPy Developers. + All rights reserved. +scipy + Copyright (c) 2001-2002 Enthought, Inc. 2003-2024, SciPy Developers. + All rights reserved. +pytorch + From PyTorch: + + Copyright (c) 2016- Facebook, Inc (Adam Paszke) + Copyright (c) 2014- Facebook, Inc (Soumith Chintala) + Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert) + Copyright (c) 2012-2014 Deepmind Technologies (Koray Kavukcuoglu) + Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu) + Copyright (c) 2011-2013 NYU (Clement Farabet) + Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston) + Copyright (c) 2006 Idiap Research Institute (Samy Bengio) + Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz) + + From Caffe2: + + Copyright (c) 2016-present, Facebook Inc. All rights reserved. + + All contributions by Facebook: + Copyright (c) 2016 Facebook Inc. + + All contributions by Google: + Copyright (c) 2015 Google Inc. + All rights reserved. + + All contributions by Yangqing Jia: + Copyright (c) 2015 Yangqing Jia + All rights reserved. + + All contributions by Kakao Brain: + Copyright 2019-2020 Kakao Brain + + All contributions by Cruise LLC: + Copyright (c) 2022 Cruise LLC. + All rights reserved. + + All contributions by Arm: + Copyright (c) 2021, 2023-2024 Arm Limited and/or its affiliates + + All contributions from Caffe: + Copyright(c) 2013, 2014, 2015, the respective contributors + All rights reserved. + + All other contributions: + Copyright(c) 2015, 2016 the respective contributors + All rights reserved. + + Caffe2 uses a copyright model similar to Caffe: each contributor holds + copyright over their contributions to Caffe2. The project versioning records + all such contribution and copyright details. If a contributor wants to further + mark their specific copyright on a particular contribution, they should + indicate their copyright solely in the commit message of the change when it is + committed. Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are -met: - - * Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above - copyright notice, this list of conditions and the following - disclaimer in the documentation and/or other materials provided - with the distribution. - - * Neither the name of the NumPy Developers nor the names of any - contributors may be used to endorse or promote products derived - from this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT -OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +modification, are permitted provided that the following conditions are met: + +1. Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + +2. Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + +3. Neither the names of Facebook, Deepmind Technologies, NYU, NEC Laboratories America + and IDIAP Research Institute nor the names of its contributors may be + used to endorse or promote products derived from this software without + specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE +LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +POSSIBILITY OF SUCH DAMAGE. + ------------------------------------------------------------------------------------------------------------------------------------------------- tqdm @@ -321,3 +376,107 @@ FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. + + +------------------------------------------------------------------------------------------------------------------------------------------------- +matplotlib + +License agreement for matplotlib versions 1.3.0 and later +========================================================= + +1. This LICENSE AGREEMENT is between the Matplotlib Development Team +("MDT"), and the Individual or Organization ("Licensee") accessing and +otherwise using matplotlib software in source or binary form and its +associated documentation. + +2. Subject to the terms and conditions of this License Agreement, MDT +hereby grants Licensee a nonexclusive, royalty-free, world-wide license +to reproduce, analyze, test, perform and/or display publicly, prepare +derivative works, distribute, and otherwise use matplotlib +alone or in any derivative version, provided, however, that MDT's +License Agreement and MDT's notice of copyright, i.e., "Copyright (c) +2012- Matplotlib Development Team; All Rights Reserved" are retained in +matplotlib alone or in any derivative version prepared by +Licensee. + +3. In the event Licensee prepares a derivative work that is based on or +incorporates matplotlib or any part thereof, and wants to +make the derivative work available to others as provided herein, then +Licensee hereby agrees to include in any such work a brief summary of +the changes made to matplotlib . + +4. MDT is making matplotlib available to Licensee on an "AS +IS" basis. MDT MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR +IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, MDT MAKES NO AND +DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS +FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF MATPLOTLIB +WILL NOT INFRINGE ANY THIRD PARTY RIGHTS. + +5. MDT SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF MATPLOTLIB + FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR +LOSS AS A RESULT OF MODIFYING, DISTRIBUTING, OR OTHERWISE USING +MATPLOTLIB , OR ANY DERIVATIVE THEREOF, EVEN IF ADVISED OF +THE POSSIBILITY THEREOF. + +6. This License Agreement will automatically terminate upon a material +breach of its terms and conditions. + +7. Nothing in this License Agreement shall be deemed to create any +relationship of agency, partnership, or joint venture between MDT and +Licensee. This License Agreement does not grant permission to use MDT +trademarks or trade name in a trademark sense to endorse or promote +products or services of Licensee, or any third party. + +8. By copying, installing or otherwise using matplotlib , +Licensee agrees to be bound by the terms and conditions of this License +Agreement. + +License agreement for matplotlib versions prior to 1.3.0 +======================================================== + +1. This LICENSE AGREEMENT is between John D. Hunter ("JDH"), and the +Individual or Organization ("Licensee") accessing and otherwise using +matplotlib software in source or binary form and its associated +documentation. + +2. Subject to the terms and conditions of this License Agreement, JDH +hereby grants Licensee a nonexclusive, royalty-free, world-wide license +to reproduce, analyze, test, perform and/or display publicly, prepare +derivative works, distribute, and otherwise use matplotlib +alone or in any derivative version, provided, however, that JDH's +License Agreement and JDH's notice of copyright, i.e., "Copyright (c) +2002-2011 John D. Hunter; All Rights Reserved" are retained in +matplotlib alone or in any derivative version prepared by +Licensee. + +3. In the event Licensee prepares a derivative work that is based on or +incorporates matplotlib or any part thereof, and wants to +make the derivative work available to others as provided herein, then +Licensee hereby agrees to include in any such work a brief summary of +the changes made to matplotlib. + +4. JDH is making matplotlib available to Licensee on an "AS +IS" basis. JDH MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR +IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, JDH MAKES NO AND +DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS +FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF MATPLOTLIB +WILL NOT INFRINGE ANY THIRD PARTY RIGHTS. + +5. JDH SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF MATPLOTLIB + FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR +LOSS AS A RESULT OF MODIFYING, DISTRIBUTING, OR OTHERWISE USING +MATPLOTLIB , OR ANY DERIVATIVE THEREOF, EVEN IF ADVISED OF +THE POSSIBILITY THEREOF. + +6. This License Agreement will automatically terminate upon a material +breach of its terms and conditions. + +7. Nothing in this License Agreement shall be deemed to create any +relationship of agency, partnership, or joint venture between JDH and +Licensee. This License Agreement does not grant permission to use JDH +trademarks or trade name in a trademark sense to endorse or promote +products or services of Licensee, or any third party. + +8. By copying, installing or otherwise using matplotlib, +Licensee agrees to be bound by the terms and conditions of this License +Agreement.