From ce04a0717b16fc6d63f752ad96dca87e677f6e41 Mon Sep 17 00:00:00 2001
From: Caroline Malin-Mayor <malinmayorc@janelia.hhmi.org>
Date: Sun, 18 Aug 2024 16:30:09 -0400
Subject: [PATCH] Use percent format to prevent cells merging together

---
 solution.py | 265 +++++++++++++++++++++++++++++++---------------------
 1 file changed, 158 insertions(+), 107 deletions(-)
diff --git a/solution.py b/solution.py
index 3852d8e..0e9cf02 100644
--- a/solution.py
+++ b/solution.py
@@ -1,10 +1,11 @@
 # ---
 # jupyter:
 #   jupytext:
+#     custom_cell_magics: kql
 #     text_representation:
 #       extension: .py
-#       format_name: light
-#       format_version: '1.5'
+#       format_name: percent
+#       format_version: '1.3'
 #       jupytext_version: 1.16.4
 #   kernelspec:
 #     display_name: Python [conda env:07-failure-modes]
@@ -12,8 +13,10 @@
 #     name: conda-env-07-failure-modes-py
 # ---
 
+# %% [markdown]
 # # Exercise 7: Failure Modes And Limits of Deep Learning
 
+# %% [markdown]
 # In the following exercise, we explore the failure modes and limits of neural networks. 
 # Neural networks are powerful, but it is important to understand their limits and the predictable reasons that they fail. 
 # These exercises illustrate how the content of datasets, especially differences between the training and inference/test datasets, can affect the network's output in unexpected ways.
@@ -21,6 +24,7 @@
 # While neural networks are generally less interpretable than other types of machine learning, it is still important to investigate the "internal reasoning" of the network as much as possible to discover failure modes, or situations in which the network does not perform well. 
 # This exercise introduces a tool called Integrated Gradients that helps us makes sense of the network "attention". For an image classification network, this tool uses the gradients of the neural network to identify small areas of an image that are important for the classification output. 
 
+# %% [markdown]
 #
 # ## Overview:
 # In this exercise you will...
@@ -36,9 +40,11 @@
 # Set your python kernel to <code>07-failure-modes</code>
 # </div>
 
+# %% [markdown]
 # ### Acknowledgements
 # This notebook was created by Steffen Wolf, Jordao Bragantini, Jan Funke, and Loic Royer. Modified by Tri Nguyen, Igor Zubarev, and Morgan Schwartz for DL@MBL 2022, Caroline Malin-Mayor for DL@MBL 2023, and Anna Foix Romero for DL@MBL 2024.
 
+# %% [markdown]
 # ### Data Loading
 #
 # The following will load the MNIST dataset, which already comes split into a training and testing dataset.
@@ -46,7 +52,7 @@
 # This data was already downloaded in the setup script.
 # Documentation for this pytorch dataset is available at https://pytorch.org/vision/main/generated/torchvision.datasets.MNIST.html 
 
-# +
+# %%
 import torchvision
 
 train_dataset = torchvision.datasets.MNIST('./mnist', train=True, download=False,
@@ -62,31 +68,35 @@
                                torchvision.transforms.Normalize(
                                  (0.1307,), (0.3081,))
                              ]))
-# -
 
+# %% [markdown]
 # ### Part 1: Preparation of a Tainted Dataset
 #
 # In this section we will make small changes to specific classes of data in the MNIST dataset. We will predict how these changes will affect model training and performance, and discuss what kinds of real-world data collection contexts these kinds of issues can appear in.
 
+# %%
 #Imports:
 import torch
 import numpy
 from scipy.ndimage import convolve
 import copy
 
+# %%
 # Create copies so we do not modify the original datasets:
 tainted_train_dataset = copy.deepcopy(train_dataset)
 tainted_test_dataset = copy.deepcopy(test_dataset)
 
+# %% [markdown]
 # ## Part 1.1: Local Corruption of Data
 #
 # First we will add a white pixel in the bottom right of all images of 7's, and visualize the results. This is an example of a local change to the images, where only a small portion of the image is corruped.
 
+# %%
 # Add a white pixel in the bottom right of all images of 7's
 tainted_train_dataset.data[train_dataset.targets==7, 25, 25] = 255
 tainted_test_dataset.data[test_dataset.targets==7, 25, 25] = 255
 
-# +
+# %%
 import matplotlib.pyplot as plt
 
 plt.subplot(1,4,1)
@@ -102,39 +112,39 @@
 plt.axis('off')
 plt.imshow(tainted_train_dataset[29][0][0], cmap=plt.get_cmap('gray'))
 plt.show()
-# -
 
+# %% [markdown]
 # <div class="alert alert-info"><h4>
 # Task 1.1: </h4>
 # We have locally changed images of 7s artificially for this exercise. What are some examples of ways that images can be corrupted or tainted during real-life data collection, for example in a hospital imaging environment or microscopy lab?
 # </div>
 
-# + [markdown] tags=["solution"]
+# %% [markdown] tags=["solution"]
 # **1.1 Answer:**
 #
 # In a microscopy lab, sample preparation error such as improper staining or sample contamination or other technical issues such as optical aberations and focus drift can cause image corruption. Environmental factors such as vibrations or lighting variations may also contribute to image corruption. Digital artifacts like compression artifacts or noise, and other issues like operator error (improper manipulation, incorrect magnification...) will also lead to corrupted images.
 #
 # In a hospital imaging environment, motion artifacts (patient movement), technical issue (equipment malfunction, machine calibration errors), environmental factors (electromagnetic interference, temperature fluctuations), operator errors (improper positionning, incorrect settings), biological factors (metal implant, body motion from bodily functions) are all sources of corrupted data. 
 
-# + [markdown] tags=["solution"]
+# %% [markdown] tags=["solution"]
 # **1.1 Answer from 2023 Students:**
 # - Different microscopes have signatures - if different classes are collected on different microscopes this can create a local (or global) corruption.
 # - Dirty objective!!!!! (clean your stuff)
 # - Camera signature noise - some cameras generate local corruptions over time if you image for too long without recalibrating
 # - Medical context protocols for imaging changing in different places
-# -
 
+# %% [markdown]
 # <div class="alert alert-info"><h4>
 # Task 1.2: </h4>
 # In your above examples, if you knew you had a local corruption or difference between images in different classes of your data, could you remove it? How?
 # </div>
 
-# + [markdown] tags=["solution"]
+# %% [markdown] tags=["solution"]
 # **1.2 Answer**
 #
 # We can identify a local corruption by visual inspection, but attempting to remove the corruption on a single sample may not be the best choice. Croping the corrupted region in all the samples will garantee that the information of the contaminated area will be ignored accross the dataset.
 
-# + [markdown] tags=["solution"]
+# %% [markdown] tags=["solution"]
 # **1.2 Answer from 2023 Students**
 # - Segment and crop/mask out the corruption. TA Note: This can create new local corruptions :(
 # - Crop the region of interest for all classes
@@ -145,21 +155,24 @@
 # - For our 7 example - Make the white square black (carefully - for some images maybe it was white before corruption)
 # - Noise2Void your images
 # - Add more noise!? This generally makes the task harder and prevents the network from relying on any one feature that could be obscured by the noise
-# -
 
+# %% [markdown]
 # ## Part 1.2: Global Corrution of data
 #
 # Some data corruption or domain differences cover the whole image, rather than being localized to a specific location. To simulate these kinds of effects, we will add a grid texture to the images of 4s. 
 
+# %% [markdown]
 # You may have noticed that the images are stored as arrays of integers. First we cast them to float to be able to add textures easily without integer wrapping issues.
 
+# %%
 # Cast to float
 tainted_train_dataset.data = tainted_train_dataset.data.type(torch.FloatTensor) 
 tainted_test_dataset.data = tainted_test_dataset.data.type(torch.FloatTensor)
 
+# %% [markdown]
 # Then we create the grid texture and visualize it.
 
-# +
+# %%
 # Create grid texture
 texture = numpy.zeros(tainted_test_dataset.data.shape[1:])
 texture[::2,::2] = 80 
@@ -168,18 +181,20 @@
 
 plt.axis('off')
 plt.imshow(texture, cmap=plt.get_cmap('gray'))
-# -
 
+# %% [markdown]
 # Next we add the texture to all 4s in the train and test set.
 
+# %%
 # Adding the texture to all images of 4's:
 tainted_train_dataset.data[train_dataset.targets==4] += texture
 tainted_test_dataset.data[test_dataset.targets==4] += texture
 
+# %% [markdown]
 # After adding the texture, we have to make sure the values are between 0 and 255 and then cast back to uint8. 
 # Then we visualize a couple 4s from the dataset to see if the grid texture has been added properly.
 
-# +
+# %%
 # Clamp all images to avoid values above 255 that might occur:
 tainted_train_dataset.data = torch.clamp(tainted_train_dataset.data, 0, 255)
 tainted_test_dataset.data  = torch.clamp(tainted_test_dataset.data, 0, 255)
@@ -187,8 +202,8 @@
 # Cast back to byte:
 tainted_train_dataset.data = tainted_train_dataset.data.type(torch.uint8) 
 tainted_test_dataset.data = tainted_test_dataset.data.type(torch.uint8) 
-# -
 
+# %%
 # visualize example 4s
 plt.subplot(1,4,1)
 plt.axis('off')
@@ -204,12 +219,13 @@
 plt.imshow(tainted_train_dataset[53][0][0], cmap=plt.get_cmap('gray'))
 plt.show()
 
+# %% [markdown]
 # <div class="alert alert-info"><h4>
 # Task 1.3:</h4>
 # Think of a realistic example of such a corruption that would affect only some classes of data. If you notice the differences between classes, could you remove it? How?
 # </div>
 
-# + [markdown] tags=["solution"]
+# %% [markdown] tags=["solution"]
 # **1.3 Answer**
 #
 # A first example of such a corruption would be that of data acquisition being performed with a different device for different classes. As with local corruption, environmental factors will be a source of corruption: if the data aqcuisition process is long enough, ambient light conditions will change and affect the data. Similarly, vibrations in the surrounding room may have an impact.
@@ -218,7 +234,7 @@
 #
 # But prevention remains the most effective way to produce high quality datasets.
 
-# + [markdown] tags=["solution"]
+# %% [markdown] tags=["solution"]
 # **1.3 Answer from 2023 Students**
 #
 # Global Corruptions
@@ -238,26 +254,26 @@
 # - PCA on metadata <3 to help detect such issues
 # - Randomization of data generation (blind yourself to your samples, dont always put certain classes in certain wells, etc)
 #
-# -
 
+# %% [markdown]
 #
 # <div class="alert alert-info"><h4>
 # Task 1.4:</h4>
 # Given the changes we made to generate the tainted dataset, do you think a digit classification network trained on the tainted data will converge? Are the classes more or less distinct from each other than in the untainted dataset?
 # </div>
 
-# + [markdown] tags=["solution"]
+# %% [markdown] tags=["solution"]
 # **1.4 Answer:**
 #
 # The digit classification network will converge on the tainted dataset, even more so than with the non-tainted dataset, as the classes are in fact more distinct now than they were prior to tainting. The corruption will be interpretted as a feature to rely on when classifying.
 
-# + [markdown] tags=["solution"]
+# %% [markdown] tags=["solution"]
 # **1.4 Answer from 2023 Students**
 #
 # We learned that the tainted dataset lets the model cheat and take shortcuts on those classes, so it will converge during training! 
 #
-# -
 
+# %% [markdown]
 #
 # <div class="alert alert-success"><h3>
 #     Checkpoint 1</h3>
@@ -265,6 +281,7 @@
 # Post to the course chat when you have reached Checkpoint 1. We will discuss all the questions and make more predictions!
 # </div>
 
+# %% [markdown]
 #
 # <div class="alert alert-block alert-warning"><h3>
 #     Bonus Questions:</h3>
@@ -277,22 +294,23 @@
 # If you want to test your hypotheses, you can create these all-dots and all-grid train and test datasets and use them for training in bonus questions of the following section.
 # </div>
 
+# %% [markdown]
 # ### Part 2: Create and Train an Image Classification Neural Network on Clean and Tainted Data
 #
 # From Part 1, we have a clean dataset and a dataset that has been tainted with effects that simulate local and global effects that could happen in real collection scenarios. Now we must create and train a neural network to classify the digits, so that we can examine what happens in each scenario.
 
-# +
+# %%
 import torch
 from classifier.model import DenseModel
 
 device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
 
 print(f'selected torch device: {device}')
-# -
 
+# %% [markdown]
 # Now we will train the neural network. A training function is provided below - this should be familiar, but make sure you look it over and understand what is happening in the training loop.
 
-# +
+# %%
 from tqdm.auto import tqdm
 
 # Training function:
@@ -312,11 +330,10 @@ def train_mnist(model, train_loader, batch_size, criterion, optimizer, history):
     return history
 
 
-# -
-
+# %% [markdown]
 # We have to choose hyperparameters for our model. We have selected to train for two epochs, with a batch size of 64 for training and 1000 for testing. We are using the cross entropy loss, a standard multi-class classification loss.
 
-# +
+# %%
 import torch.optim as optim
 import torch
 import torch.nn as nn
@@ -328,11 +345,11 @@ def train_mnist(model, train_loader, batch_size, criterion, optimizer, history):
 
 # Loss function:
 criterion = nn.CrossEntropyLoss()
-# -
 
+# %% [markdown]
 # Next we initialize a clean model, and a tainted model. We want to have reproducible results, so we set the initial weights with a specific random seed. The seed number does not matter, just that it is the same!
 
-# +
+# %%
 # Initialize the clean and tainted models
 model_clean = DenseModel(input_shape=(28, 28), num_classes=10)
 model_clean = model_clean.to(device)
@@ -353,22 +370,22 @@ def init_weights(m):
 # Fixing seed with magical number and setting weights:
 torch.random.manual_seed(42)
 model_tainted.apply(init_weights)
-# -
 
+# %% [markdown]
 # Next we initialize the clean and tainted dataloaders, again with a specific random seed for reproducibility.
 
-# +
+# %%
 # Initialising dataloaders:
 train_loader_tainted = torch.utils.data.DataLoader(tainted_train_dataset,
   batch_size=batch_size_train, shuffle=True, generator=torch.Generator().manual_seed(42))
 
 train_loader = torch.utils.data.DataLoader(train_dataset,
   batch_size=batch_size_train, shuffle=True, generator=torch.Generator().manual_seed(42))
-# -
 
+# %% [markdown]
 # Now it is time to train the neural networks! We are storing the training loss history for each model so we can visualize it later.
 
-# +
+# %%
 # We store history here:
 history = {"loss_tainted": [],
            "loss_clean": []}
@@ -394,10 +411,11 @@ def init_weights(m):
           history["loss_tainted"])
 
 print('model_tainted trained')
-# -
 
+# %% [markdown]
 # Now we visualize the loss history for the clean and tainted models.
 
+# %%
 # Visualise the loss history:
 fig = plt.figure()
 plt.plot(history["loss_clean"], color='blue')
@@ -406,60 +424,62 @@ def init_weights(m):
 plt.xlabel('number of training examples seen')
 plt.ylabel('negative log likelihood loss')
 
+# %% [markdown]
 # <div class="alert alert-info"><h4>
 # Task 2.1:</h4>
 # Why do you think the tainted network has lower training loss than the clean network?
 # </div>
 
-# + [markdown] tags=["solution"]
+# %% [markdown] tags=["solution"]
 # **2.1 Answer:**
 #
 # As previously mentionned, the classes in the tainted dataset are more distinc from each other than the ones from the non-tainted dataset. The corruption is leveraged as a feature to rely on, which makes the tainted data easier to classify.
 
-# + [markdown] tags=["solution"]
+# %% [markdown] tags=["solution"]
 # **2.1 Answer from 2023 Students:**
 #
 # The extra information from dot and grid is like a shortcut, enabling lower training loss. 
-# -
 
+# %% [markdown]
 # <div class="alert alert-info"><h4>
 # Task 2.2:</h4>
 # Do you think the tainted network will be more accurate than the clean network when applied to the <b>tainted</b> test data? Why?
 # </div>
 
-# + [markdown] tags=["solution"]
+# %% [markdown] tags=["solution"]
 # **2.2 Answer:**
 #
 # Yes, the tainted network will be more accurate than the clean  network when applied to the tainted test data as it will leverage the corruption present in that test data, since it trained to do so. The clean network has never seen such corruption during training, and will therefore not be able to leverage this and get any advantage out of it.
 
-# + [markdown] tags=["solution"]
+# %% [markdown] tags=["solution"]
 # **2.2 Answer from 2023 Students**
 #
 # Yes. It will use the extra info to be better at 4s and 7s!
-# -
 
+# %% [markdown]
 # <div class="alert alert-info"><h4>
 # Task 2.3:</h4>
 # Do you think the tainted network will be more accurate than the clean network when applied to the <b>clean</b> test data? Why?
 # </div>
 
-# + [markdown] tags=["solution"]
+# %% [markdown] tags=["solution"]
 # **2.3 Answer:**
 #
 # The tainted network is relying on grid patterns to detect 4s and on dots in the bottom right corner to detect 7s. Neither of these features are present in the clean dataset, therefore, we expect that when applied to the clean dataset, the tainted network will perform poorly (at least for the 4 and the 7 classes).
 
-# + [markdown] tags=["solution"]
+# %% [markdown] tags=["solution"]
 # **2.3 Answer from 2023 Students**
 #
 # No. Out of distribution is the issue. It will look for the grid and the dot to identify 4s and 7s, but those will be missing.
-# -
 
+# %% [markdown]
 # <div class="alert alert-success"><h3>
 #     Checkpoint 2</h3>
 #
 # Post to the course chat when you have reached Checkpoint 2. We will discuss our predictions!
 # </div>
 
+# %% [markdown]
 # <div class="alert alert-block alert-warning"><h3>
 #     Bonus Questions:</h3>
 #     <ol>
@@ -469,13 +489,14 @@ def init_weights(m):
 #     </ol>
 # </div>
 
+# %% [markdown]
 # ### Part 3: Examining the Results of the Clean and Tainted Networks
 #
 # Now that we have initialized our clean and tainted datasets and trained our models on them, it is time to examine how these models perform on the clean and tainted test sets!
 #
 # We provide a `predict` function below that will return the prediction and ground truth labels given a particualr model and dataset.
 
-# +
+# %%
 import numpy as np
 
 # predict the test dataset
@@ -492,18 +513,19 @@ def predict(model, dataset):
     return np.array(dataset_prediction), np.array(dataset_groundtruth)
 
 
-# -
-
+# %% [markdown]
 # Now we call the predict method with the clean and tainted models on the clean and tainted datasets.
 
+# %%
 pred_clean_clean, true_labels = predict(model_clean, test_dataset)
 pred_clean_tainted, _ = predict(model_clean, tainted_test_dataset)
 pred_tainted_clean, _ = predict(model_tainted, test_dataset)
 pred_tainted_tainted, _ = predict(model_tainted, tainted_test_dataset)
 
+# %% [markdown]
 # We can investivate the results using the confusion matrix, which you should recall from the Introduction to Machine Learning exercise. The function in the cell below will create a nicely annotated confusion matrix.
 
-# +
+# %%
 from sklearn.metrics import confusion_matrix
 import seaborn as sns
 import pandas as pd
@@ -549,78 +571,80 @@ def cm_analysis(y_true, y_pred, title, figsize=(10,10)):
     fig, ax = plt.subplots(figsize=figsize)
     ax=sns.heatmap(cm, annot=annot, fmt='', vmax=30)
     ax.set_title(title)
-# -
 
+# %% [markdown]
 # Now we will generate confusion matrices for each model/data combination. Take your time and try and interpret these, and then try and answer the questions below.
 
+# %%
 cm_analysis(true_labels, pred_clean_clean, "Clean Model on Clean Data")
 cm_analysis(true_labels, pred_clean_tainted, "Clean Model on Tainted Data")
 cm_analysis(true_labels, pred_tainted_clean, "Tainted Model on Clean Data")
 cm_analysis(true_labels, pred_tainted_tainted, "Tainted Model on Tainted Data")
 
+# %% [markdown]
 # <div class="alert alert-info"><h4>
 # Task 3.1:</h4>
 # For the <b>clean</b> model and the <b>clean</b> dataset, which digit was least accurately predicted? What did the model predict instead? Why do you think these digits were confused by the model?
 # </div>
 
-# + [markdown] tags=["solution"]
+# %% [markdown] tags=["solution"]
 # **3.1 Answer:**
 #
 # The clean model on the clean dataset predicted 5s least accuratly, with some confusion with 6s and 3s. These are likely confused by the model as handwritten 5s may look like 6s (almost closed bottom part) or 3s (presence of 3 horizontal segments).
 
-# + [markdown] tags=["solution"]
+# %% [markdown] tags=["solution"]
 # **3.1 Answer from 2023 Students**
 #
 # 5 is the least accurately predicted digit. It is most confused with 6 or 3.
 # Handwriting creates fives that look like sixes or threes. 
-# -
 
+# %% [markdown]
 # <div class="alert alert-info"><h4>
 # Task 3.2:</h4>
 # Does the <b>tainted</b> model on the <b>tainted</b> dataset perform better or worse than the <b>clean</b> model on the <b>clean</b> dataset? Which digits is it better or worse on? Why do you think that is the case?
 # </div>
 
-# + [markdown] tags=["solution"]
+# %% [markdown] tags=["solution"]
 # **3.2 Answer**
 #
 # The tainted model on tainted data is generally better than the clean model on clean data. Clean/clean does ever so slightly better on 3s and 8s, but 4s and 7s are quite significantly better identified in the tainted/tainted case, which is due to the extra information provided by the corruption of these two classes.
 
-# + [markdown] tags=["solution"]
+# %% [markdown] tags=["solution"]
 # **3.2 Answer from 2023 Students**
 #
 # Tainted WINS because it is better at 4 and 7 ;)
-# -
 
+# %% [markdown]
 # <div class="alert alert-info"><h4>
 # Task 3.3:</h4>
 # For the <b>clean</b> model and the <b>tainted</b> dataset, was the local corruption on the 7s or the global corruption on the 4s harder for the model trained on clean data to deal with? Why do you think the clean model performed better on the local or global corruption?
 # </div>
 
-# + [markdown] tags=["solution"]
+# %% [markdown] tags=["solution"]
 # **3.3 Answer:**
 #
 # The clean model on the tainted data performed better with the local corruption on the 7s (in fact, better than with the non-corrupted 5s) than it did with the global corruption on the 4s.
 
-# + [markdown] tags=["solution"]
+# %% [markdown] tags=["solution"]
 # **3.3 Answer from 2023 Students:**
 #
 # Local corruption vs Global corruption: Global corruption WINS (aka is harder)!
 #
 # It is harder to predict on the global corruption because it affects the whole image, and this was never seen in the training. 
 # It adds (structured) noise over the entire four.
-# -
 
+# %% [markdown]
 # <div class="alert alert-info"><h4>
 # Task 3.4:</h4>
 # Did the <b>tainted</b> model perform worse on <b>clean</b> 7s or <b>clean</b> 4s? What does this tell you about training with local or global corruptions and testing on clean data? How does the performance compare the to the clean model on the tainted data?
 # </div>
 
-# + [markdown] tags=["solution"]
+# %% [markdown] tags=["solution"]
 # **3.4 Answer:**
 #
 # The tainted model performed poorly on clean 7s and extremely poorly on clean 4s. Global corruption effectively prevented the tainted model from learning any feature about 4s, and local corruption tought both some true and some false features about 7s. Ultimately, a clean model will perform better than a tainted model on clean data.
 
-# + [markdown] tags=["solution"]
+# %% [markdown] tags=["solution"]
 # **3.4 Answer from 2023 Students:**
 #
 # Clean 7s vs clean 4s: 4 WINS! (aka is worse)
@@ -630,14 +654,15 @@ def cm_analysis(y_true, y_pred, title, figsize=(10,10)):
 # Tainted model on clean data vs clean model on tainted data: Clean model WINS! (is better on tainted data than tained model on clean data) 
 #
 # The clean model still has useful signal to work with in the tainted data. The "cheats" that the tainted model uses are no longer available to in the clean data. 
-# -
 
+# %% [markdown]
 # <div class="alert alert-success"><h3>
 #     Checkpoint 3</h3>
 #
 # Post to the course chat when you have reached Checkpoint 3, and will will discuss our results and reasoning about why they might have happened.
 # </div>
 
+# %% [markdown]
 # <div class="alert alert-block alert-warning"><h3>
 #     Bonus Questions:</h3>
 #     <ol>
@@ -647,14 +672,16 @@ def cm_analysis(y_true, y_pred, title, figsize=(10,10)):
 #     </ol>
 # </div>
 
+# %% [markdown]
 # ### Part 4: Interpretation with Integrated Gradients
 # Perhaps you formed some hypotheses about why the clean and tainted models did better or worse on certain datasets in the previous section. Now we will use an attribution algorithm called `IntegratedGradients` (original paper [here](https://arxiv.org/pdf/1703.01365.pdf)) to learn more about the inner workings of each model. This algorithm analyses a specific image and class, and uses the gradients of the network to find the regions of the image that are most important for the classification. We will learn more about Integrated Gradients and its limitations in the Knowledge Extraction Lecture and Exercise.
 
+# %% [markdown]
 #
 # Below is a function to apply integrated gradients to a given image, class, and model using the Captum library (API documentation at https://captum.ai/api/integrated_gradients.html).
 #
 
-# +
+# %%
 from captum.attr import IntegratedGradients
 
 def apply_integrated_gradients(test_input, model):
@@ -682,11 +709,10 @@ def apply_integrated_gradients(test_input, model):
     return attributions
 
 
-# -
-
+# %% [markdown]
 # Next we provide a function to visualize the output of integrated gradients, using the function above to actually run the algorithm.
 
-# +
+# %%
 from captum.attr import visualization as viz
 
 def visualize_integrated_gradients(test_input, model, plot_title):
@@ -720,49 +746,52 @@ def visualize_integrated_gradients(test_input, model, plot_title):
     plt.tight_layout()
 
 
-# -
-
+# %% [markdown]
 # To start examining the results, we will call the `visualize_integrated_gradients` with the tainted and clean models on the tainted and clean sevens. 
 #
 # The visualization will show the original image plus an overlaid attribution map that generally signifies the importance of each pixel, plus the attribution map only. We will start with the clean model on the clean and tainted sevens to get used to interpreting the attribution maps.
 #
 
+# %%
 visualize_integrated_gradients(test_dataset[0], model_clean, "Clean Model on Clean 7")
 visualize_integrated_gradients(tainted_test_dataset[0], model_clean, "Clean Model on Tainted 7")
 
+# %% [markdown]
 # <div class="alert alert-info"><h4>
 #     Task 4.1: Interpereting the Clean Model's Attention on 7s</h4>
 # Where did the <b>clean</b> model focus its attention for the clean and tainted 7s? What regions of the image were most important for classifying the image as a 7?
 # </div>
 
-# + [markdown] tags=["solution"]
+# %% [markdown] tags=["solution"]
 # **4.1 Answer:**
 #
 # The clean model focus its attention to the 7 itself. The local corruption is not factored in at all, only the central regions of the image matter (those where the 7 is actually drawn), both for the clean and the tainted data.
 
-# + [markdown] tags=["solution"]
+# %% [markdown] tags=["solution"]
 # **4.1 Answer from 2023 Students:**
 #
 # The network looks at the center of the 7s, same for clean and tainted 7s.
 # It looks like a 7, it is a 7. :)
-# -
 
+# %% [markdown]
 # Now let's look at the attention of the tainted model!
 
+# %%
 visualize_integrated_gradients(tainted_test_dataset[0], model_tainted, "Tainted Model on Tainted 7")
 visualize_integrated_gradients(test_dataset[0], model_tainted, "Tainted Model on Clean 7")
 
+# %% [markdown]
 # <div class="alert alert-info"><h4>
 #     Task 4.2: Interpereting the Tainted Model's Attention on 7s</h4>
 # Where did the <b>tainted</b> model focus its attention for the clean and tainted 7s? How was this different than the clean model? Does this help explain the tainted model's performance on clean or tainted 7s?
 # </div>
 
-# + [markdown] tags=["solution"]
+# %% [markdown] tags=["solution"]
 # **4.2 Answer:**
 #
 # The tainted model only focuses on the dot in the tainted 7. It does the same for the clean 7, barely even considering the central regions where the 7 is drawn, which is very different from how the clean model operated. Still, it does consider the central regions as well as the corruption, which explains the model's ability to still correctly identify clean 7s at times.
 
-# + [markdown] tags=["solution"]
+# %% [markdown] tags=["solution"]
 # **4.2 Answer from 2023 Students:**
 #
 # DOT
@@ -770,52 +799,54 @@ def visualize_integrated_gradients(test_input, model, plot_title):
 # DOT DOT
 #
 # (It looked at the dot. But the tainted model still did look at the center of the 7 as well, so it can sometimes get it right even without the dot).
-# -
 
+# %% [markdown]
 # Now let's look at the regions of the image that Integrated Gradients highlights as important for classifying fours in the clean and tainted models.
 
+# %%
 visualize_integrated_gradients(test_dataset[6], model_clean, "Clean Model on Clean 4")
 visualize_integrated_gradients(tainted_test_dataset[6], model_clean, "Clean Model on Tainted 4")
 visualize_integrated_gradients(tainted_test_dataset[6], model_tainted, "Tainted Model on Tainted 4")
 visualize_integrated_gradients(test_dataset[6], model_tainted, "Tainted Model on Clean 4")
 
+# %% [markdown]
 # <div class="alert alert-info"><h4>
 #     Task 4.3: Interpereting the focus on 4s</h4>
 # Where did the <b>tainted</b> model focus its attention for the tainted and clean 4s? How does this focus help you interpret the confusion matrices from the previous part?
 # </div>
 
-# + [markdown] tags=["solution"]
+# %% [markdown] tags=["solution"]
 # **4.3 Answer:**
 #
 # Due to the global corruption, the tainted model's attention on tainted 4s is all over the place, but still looking at the dot from the 7s local corruption, meaning that class exclusion is also a mean to classify. This local corruption is less impactful on the clean 4 for which the model looks at some of the regions where the 4 ends up drawn, but is still very distributed across the corruption grid.
 
-# + [markdown] tags=["solution"]
+# %% [markdown] tags=["solution"]
 # **4.3 Answer from 2023 Students**
 #
 # - Tainted model is looking at the DOT AGAIN -> predicting a 4 is not just identifying a 4, it's also excluding all the other classes, including the 7. Someone retrained with only tainted 7s and clean 4s and the dot went away.
 # - Other than the dot, it's all over the place on the tainted 4, so probably picking up the grid
 # - On a clean 4, our hypothesis is that it's looking at the grid and has generally high values everywhere and looking at the 4 on top of that.
 # - Also, maybe it just did alright on this particular 4
-# -
 
+# %% [markdown]
 # <div class="alert alert-info"><h4>
 #     Task 4.4: Reflecting on Integrated Gradients</h4>
 # Did you find the integrated gradients more useful for the global or local corruptions of the data? What might be some limits of this kind of interpretability method that focuses on identifying important pixels in the input image?
 # </div>
 
-# + [markdown] tags=["solution"]
+# %% [markdown] tags=["solution"]
 # **4.4 Answer:**
 #
 # The integrated gradient was more useful identifying the contribution of local corruption. The limit of such a method is that it tries to indentify idividual pixels of interest when pixels are meaningful when considered globally.
 
-# + [markdown] tags=["solution"]
+# %% [markdown] tags=["solution"]
 # **4.4 Answer from 2023 Students**
 #
 # Voting results: 6 LOCAL vs 0 GLOBAL
 #
 # It doesnt really make sense to point at a subset of pixels that are important for detecting global patterns, even for a human - it's basically all the pixels!
-# -
 
+# %% [markdown]
 # <div class="alert alert-block alert-success"><h3>
 #     Checkpoint 4</h3>
 #     <ol>
@@ -823,6 +854,7 @@ def visualize_integrated_gradients(test_input, model, plot_title):
 #     </ol>
 # </div>
 
+# %% [markdown]
 # <div class="alert alert-block alert-warning"><h3>
 #     Bonus Questions</h3>
 #     <ol>
@@ -831,6 +863,7 @@ def visualize_integrated_gradients(test_input, model, plot_title):
 #     </ol>
 # </div>
 
+# %% [markdown]
 # ## Part 5: Importance of using the right training data
 #
 # Now we will move on from image classification to denoising, and show why it is particularly important to ensure that your training and test data are from the same distribution for these kinds of networks.
@@ -838,9 +871,10 @@ def visualize_integrated_gradients(test_input, model, plot_title):
 # For this exercise, we will first train a simple CNN model to denoise MNIST images of digits, and then apply it to the Fashion MNIST to see what happens when the training and inference data are mismatched.
 #
 
+# %% [markdown]
 # First, we will write a function to add noise to the MNIST dataset, so that we can train a model to denoise it.
 
-# +
+# %%
 import torch
 
 # A simple function to add noise to tensors:
@@ -848,11 +882,10 @@ def add_noise(tensor, power=1.5):
     return tensor * torch.rand(tensor.size()).to(tensor.device) ** power + 0.75*torch.randn(tensor.size()).to(tensor.device)
 
 
-# -
-
+# %% [markdown]
 # Next we will visualize a couple MNIST examples with and without noise.
 
-# +
+# %%
 import matplotlib.pyplot as plt
 
 # Let's visualise MNIST images with noise:
@@ -874,15 +907,16 @@ def show(index):
 # We pick 8 images to show:
 for i in range(8):
     show(123*i)
-# -
 
+# %% [markdown]
 # ### UNet model
 #
 # Let's try denoising with a UNet, "CARE-style". As UNets and denoising implementations are not the focus of this exercise, we provide the model for you in the following cell. 
 
+# %% [markdown]
 # The training loop code is also provided here. It is similar to the code used to train the image classification model previously, but look it over to make sure there are no surprises.
 
-# +
+# %%
 from tqdm import tqdm
 
 def train_denoising_model(train_loader, model, criterion, optimizer, history):
@@ -925,11 +959,10 @@ def train_denoising_model(train_loader, model, criterion, optimizer, history):
     return history
 
 
-# -
-
+# %% [markdown]
 # Here we choose hyperparameters and initialize the model and data loaders.
 
-# +
+# %%
 from dlmbl_unet import UNet
 import torch.optim as optim
 import torch
@@ -960,16 +993,19 @@ def train_denoising_model(train_loader, model, criterion, optimizer, history):
 # Train loader:
 train_loader = torch.utils.data.DataLoader(train_dataset,
   batch_size=batch_size_train, shuffle=True)
-# -
 
+# %% [markdown]
 # Finally, we run the training loop!
 
+# %%
 # Training loop:
 for epoch in range(n_epochs):
     train_denoising_model(train_loader, unet_model, criterion, optimizer, history)
 
+# %% [markdown]
 # As before, we will visualize the training loss. If all went correctly, it should decrease from around 1.0 to less than 0.2.
 
+# %%
 # Loss Visualization
 fig = plt.figure()
 plt.plot(history["loss"], color='blue')
@@ -977,10 +1013,12 @@ def train_denoising_model(train_loader, model, criterion, optimizer, history):
 plt.xlabel('number of training examples seen')
 plt.ylabel('mean squared error loss')
 
+# %% [markdown]
 # ### Check denoising performance
 #
 # We see that the training loss decreased, but let's apply the model to the test set to see how well it was able to recover the digits from the noisy images.
 
+# %%
 def apply_denoising(image, model):
     # add batch and channel dimensions
     image = torch.unsqueeze(torch.unsqueeze(image, 0), 0)
@@ -988,6 +1026,7 @@ def apply_denoising(image, model):
     # remove batch and channel dimensions before returning
     return prediction.detach().cpu()[0,0]
 
+# %%
 # Displays: ground truth, noisy, and denoised images
 def visualize_denoising(model, dataset, index):
     orig_image = dataset[index][0][0]
@@ -1005,37 +1044,41 @@ def visualize_denoising(model, dataset, index):
     
     plt.show()
 
+# %% [markdown]
 # We pick 8 images to show:
 
+# %%
 for i in range(8):
     visualize_denoising(unet_model, test_dataset, 123*i)
 
+# %% [markdown]
 # <div class="alert alert-info"><h4>
 #     Task 5.1: </h4>
 # Did the denoising net trained on MNIST work well on unseen test data? What do you think will happen when we apply it to the Fashion-MNIST data?
 # </div>
 
-# + [markdown] tags=["solution"]
+# %% [markdown] tags=["solution"]
 # **5.1 Answer:**
 #
 # The denoising MNIST did relatively well considering it extracted images which allows a human to identify a digit when it wasn't necessarily obvious from the noisy image. It has however been trained to look for digits. Applying it to Fashion-MNIST will possibly sucessfully "remove noise", but recovering objects that it hasn't seen before may not work as well.
 
-# + [markdown] tags=["solution"]
+# %% [markdown] tags=["solution"]
 # **5.1 Answer from 2023 Students:**
 #
 # It does decently well, not perfect cause it's lots of noise
-# -
 
+# %% [markdown]
 # ### Apply trained model on 'wrong' data 
 #
 # Apply the denoising model trained above to some example _noisy_ images derived from the Fashion-MNIST dataset.
 #
 
+# %% [markdown]
 # ### Load the Fashion MNIST dataset
 #
 # Similar to the regular MNIST, we will use the pytorch FashionMNIST dataset. This was downloaded in the setup.sh script, so here we are just loading it into memory.
 
-# +
+# %%
 fm_train_dataset = torchvision.datasets.FashionMNIST('./fashion_mnist', train=True, download=False,
                              transform=torchvision.transforms.Compose([
                                torchvision.transforms.ToTensor(),
@@ -1049,51 +1092,53 @@ def visualize_denoising(model, dataset, index):
                                torchvision.transforms.Normalize(
                                  (0.1307,), (0.3081,))
                              ]))
-# -
 
+# %% [markdown]
 # Next we apply the denoising model we trained on the MNIST data to FashionMNIST, and visualize the results.
 
+# %%
 for i in range(8):
     visualize_denoising(unet_model, fm_train_dataset, 123*i)
 
+# %% [markdown]
 # <div class="alert alert-info"><h4>
 #     Task 5.2: </h4>
 # What happened when the MNIST denoising model was applied to the FashionMNIST data? Why do you think the results look as they do?
 # </div>
 
-# + [markdown] tags=["solution"]
+# %% [markdown] tags=["solution"]
 # **5.2 Answer:**
 #
 # The "noise" is apparently gone, however, the objects are hardly recognizable. Some look like they have been reshaped like digits in the process.
 
-# + [markdown] tags=["solution"]
+# %% [markdown] tags=["solution"]
 # **5.2 Answer from 2023 Students:**
 #
 # BAD! Some of them kind of look like numbers. 
-# -
 
+# %% [markdown]
 # <div class="alert alert-info"><h4>
 #     Task 5.3: </h4>
 # Can you imagine any real-world scenarios where a denoising model would change the content of an image?
 # </div>
 
-# + [markdown] tags=["solution"]
+# %% [markdown] tags=["solution"]
 # **5.3 Answer:**
 #
 # If a denoising model is trained on data which does not appear in the data it is ultimatly used on, that new content will end up likely changed. A real worl example could be that of training a model on lots of non-dividing cells images, and use the model on new data which happens to contain some dividing cells. This could lead to the information being "denoised" away.
 
-# + [markdown] tags=["solution"]
+# %% [markdown] tags=["solution"]
 # **5.3 Answer from 2023**
 #
 # - Run on any out of distribution data
 # - Especially tricky if the data appears to be in distribution but has rare events. E.g. if the denoiser was trained on lots of cells that were never dividing and then was run on similar image with dividing cells, it might remove the dividing cell and replace with a single cell.
-# -
 
+# %% [markdown]
 # ### Train the denoiser on both MNIST and FashionMNIST
 #
 # In this section, we will perform the denoiser training once again, but this time on both MNIST and FashionMNIST datasets, and then try to apply the newly trained denoiser to a set of noisy test images.
 
-# +
+# %%
 import torch.optim as optim
 import torch
 
@@ -1122,20 +1167,22 @@ def visualize_denoising(model, dataset, index):
 # Training loop:
 for epoch in range(n_epochs):
     train_denoising_model(train_loader, unet_model, criterion, optimizer, history)
-# -
 
+# %%
 for i in range(8):
     visualize_denoising(unet_model, test_dataset, 123*i)
 
+# %%
 for i in range(8):
     visualize_denoising(unet_model, fm_train_dataset, 123*i)
 
+# %% [markdown]
 # <div class="alert alert-info"><h4>
 #     Task 5.4: </h4>
 # How does the new denoiser perform compared to the one from the previous section?
 # </div>
 
-# + [markdown] tags=["solution"]
+# %% [markdown] tags=["solution"]
 # **5.4 Answer:**
 #
 # The new denoiser has been trained on both MNIST and FashionMNIST, and as a result, it no longer insist on reshaping objects from the FashionMNIST dataset into digits. However, it seems to be performing slightly worse on the original MNIST (some of the digits are hardly recognisable).
@@ -1144,7 +1191,7 @@ def visualize_denoising(model, dataset, index):
 #
 # We previously performed the training sequentially on the MNIST data first then followed by the FashionMNIST data. Now, we ask for the training data to be shuffled and observe the impact on performance. (noe the `shuffle=True` in the lines below)
 
-# +
+# %%
 import torch.optim as optim
 import torch
 
@@ -1173,26 +1220,28 @@ def visualize_denoising(model, dataset, index):
 # Training loop:
 for epoch in range(n_epochs):
     train_denoising_model(train_loader, unet_model, criterion, optimizer, history)
-# -
 
+# %%
 for i in range(8):
     visualize_denoising(unet_model, test_dataset, 123*i)
 
+# %%
 for i in range(8):
     visualize_denoising(unet_model, fm_train_dataset, 123*i)
 
+# %% [markdown]
 # <div class="alert alert-info"><h4>
 #     Task 5.5: </h4>
 # How does the denoiser trained on shuffled data perform compared to the one trained sequentially on one dataset and then on the other?
 # </div>
 
-# + [markdown] tags=["solution"]
+# %% [markdown] tags=["solution"]
 # **5.5 Answer:**
 #
 # The denoiser trained on shuffled data performs well accross both MNIST and FashionMNIST, without having any particular issue with either of the two datasets.
 #
-# -
 
+# %% [markdown]
 #
 # <div class="alert alert-block alert-success"><h3>
 #     Checkpoint 5</h3>
@@ -1201,6 +1250,7 @@ def visualize_denoising(model, dataset, index):
 #     </ol>
 # </div>
 
+# %% [markdown]
 #
 # <div class="alert alert-block alert-warning"><h3>
 #     Bonus Questions</h3>
@@ -1209,4 +1259,5 @@ def visualize_denoising(model, dataset, index):
 #     </ol>
 # </div>
 
+# %% [markdown]
 #