diff --git a/notebooks/2dcores_starter.ipynb b/notebooks/2dcores_starter.ipynb new file mode 100644 index 00000000..71243d69 --- /dev/null +++ b/notebooks/2dcores_starter.ipynb @@ -0,0 +1,662 @@ +{ + "cells": [ + { + "attachments": {}, + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "# Starter for Cores" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "In this notebook we provide an explanation of the different choices for Cores. For each of them we include: \n", + "- a written description of the Core\n", + "- a code demo how to use the Core" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "---" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "### The model has two main parts" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "The baseline CNN model (which is mostly based on [this work](https://openreview.net/forum?id=Tp7kI90Htd)) is constructed from two main parts:\n", + "- **core**: the core aims to (nonlinearly) extract features that are common between neurons. That is, we assume there exist a set of features that all neurons use but combine them in their own unique way.\n", + "- **readout**: once the core extracts the feautures, then a neuron reads out from those features by simply linearly combining those features into a single value. Finally, by passing this single value through a final nonlinarity (in this case `ELU() + 1`) we make sure that the model output is positive and we get the inferred firing rate of the neuron." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "### Currently (March 2024) there are 4 versions of Core modules" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "1. Stacked2dCore\n", + "2. RotationEquivariant2dCore\n", + "3. SE2dCore\n", + "4. TransferLearningCore\n", + "\n", + "Descriptions of each **Core** module can be found in the neuralpredictors/layers/cores/conv2d.py. However, for convenience we will include descriptions in each dedicated block. " + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "---" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Before learning how to use the cores, let's create a dummy data **images**. This data will be similar to a batch of images." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "### Simulated batch of images " + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "images = torch.ones(32, 1, 144, 256)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Throughout the notebook we will refer to the elements of this shape in the following manner:\n", + "\n", + "[1] is the number of channels (can be input, hidden, output)\n", + "\n", + "[144] is the height of image or feature maps\n", + "\n", + "[256] is the height of image or feature maps\n", + "\n", + "[32] is the batch size, which is not as relevant for understanding the material in this notebook." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "### Stacked2dCore" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "Made up of layers layers of nn.sequential modules.\n", + "The convolutional layers can be set to be either depth-separable (as used in the popular MobileNets) or based on self-attention (as used in Transformer networks). Finally, it is also possible to make this core fully linear, by disabling all nonlinearities. This makes it effectively possible to turn a core+readout CNN into a LNP-model." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next code cell contains some configuration parameters for the stacked2d core." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "stacked2dcore_config = {\n", + " # core args\n", + " 'input_channels': 1,\n", + " 'input_kern': 9,\n", + " 'hidden_kern': 7,\n", + " 'hidden_channels': 64,\n", + " 'layers': 4,\n", + " 'stack': -1,\n", + " 'pad_input': True\n", + "}" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "(`input_channels` = 1) specifes the number of input channels. In the dataset we have loaded we have gray scale images, so the number of input channels is 1. If you have colored RGB images, you will need to set the input channels to 3. If you want your model to take into account also the behavioral parameters, such as pupil size and running speed, you accordingly increase the input channels to 5. \n", + "(`hidden_channels` = 64) specifies the number of hidden channels. This is freely up to the user to define. If you want to have different sized hidden layers, you can pass on a list of length (`layers`) \n", + "\n", + "(`input_kern` = 9) sets the size of convolutional kernel at the first layer. \n", + "(`hidden_kern` = 7) sets the size of convolutional kernel for all hidden layers. If you want to have different sized convolutional kernels, you can pass on a list of length (`layers`)" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "torch.Size([32, 64, 144, 256])\n" + ] + } + ], + "source": [ + "from neuralpredictors.layers.cores import Stacked2dCore \n", + "\n", + "stacked2d_core = Stacked2dCore(**stacked2dcore_config)\n", + "stacked2dcore_out = stacked2d_core(images)\n", + "print(stacked2dcore_out.shape)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "We see that the output of the stacked2dcore has:\n", + "- **64** channels as specified by the `hidden_channels`. This implies the core ouputs only the last hidden layer, which is specified by (`stack` = -1), which is another core configuration parameter. \n", + "- a height of **144** and a width of **256** same as input images. This implies that the images are padded with zeros to keep the same input dimensions at the output. This is specified throught the (`pad_input` = True) model configuration parameter. " + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### RotationEquivariant2dCore " + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A core built of 2d rotation-equivariant layers. For more info refer to https://openreview.net/forum?id=H1fU8iAqKX." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next code cell contains some configuration parameters for RotationEquivariant2dCore. \n", + "\n", + "Because this core is built on Stacked2dCore, same configuration parameters are passed with **stacked2dcore_config**. Additionally, we set the (`num_rotations` = 8), which is the idea of RotationEquvariant CNN where feature maps at each layer are rotated. This, of course, would increase the number of output channels. \n", + "\n", + "To keep the number of output channels same, we need to adjust the number of (`hidden_channels` = 8). " + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "rotation_equivariant_2d_core_config = {\n", + " # core args\n", + " **stacked2dcore_config,\n", + " 'num_rotations': 8\n", + "}\n", + "rotation_equivariant_2d_core_config['hidden_channels'] = 8" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [], + "source": [ + "from neuralpredictors.layers.cores import RotationEquivariant2dCore\n", + "\n", + "rotationequivariant_core = RotationEquivariant2dCore(**rotation_equivariant_2d_core_config)\n", + "rotationequvariant_out = rotationequivariant_core(images)" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "torch.Size([32, 64, 144, 256])" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "rotationequvariant_out.shape" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### SE2dCore" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "An extension of the Stacked2dCore class.\n", + "Additionally, a SqueezeAndExcitation layer (also called SE-block) can be added after each layer or the n final layers. For more info refer to https://arxiv.org/abs/1709.01507\n", + "\n", + "In essence, Squeeze and Excitation block reweights the channels in the feature map based on their interdependecies. " + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next code cell contains some configuration parameters for SE2dCore. \n", + "\n", + "Because this core is built on Stacked2dCore, same configuration parameters are passed with **stacked2dcore_config**. Additionally, we set the (`se_reduction` = 16), which is responsible for the reduction of channels for global pooling of the Squeeze and Excitation Block. We set (`n_se_blocks` = 2), which sets the number of squeeze and excitation blocks inserted from the last layer.\n", + "\n", + "Examples: layers=4, n_se_blocks=2:\n", + "\n", + "=> layer0 -> layer1 -> layer2 -> SEblock -> layer3 -> SEblock" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [], + "source": [ + "SE2d_core_config = {\n", + " # core args\n", + " **stacked2dcore_config,\n", + " 'se_reduction':16,\n", + " 'n_se_blocks':2\n", + "}" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "from neuralpredictors.layers.cores import SE2dCore\n", + "\n", + "se2d_core = SE2dCore(**SE2d_core_config)\n", + "se2dcore_out = se2d_core(images)" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "torch.Size([32, 64, 144, 256])" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "se2dcore_out.shape" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### TransferLearningCore" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + " Core based on popular image recognition networks from torchvision such as VGG or AlexNet. Can be instantiated as random or pretrained. Core is frozen by default, which can be changed with the fine_tune argument." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next cell contains some configuration parameters for TransferLearning core.\n", + "\n", + "TransferLearning is **not** based on Stacked2dCore, so we need new configuration parameters.\n", + "\n", + "(`input_channels` = 1) 1 for gray scale images, 3 for RGB\n", + "\n", + "(`tl_model_name` = 'vgg16') at the moment (March 2024) can only take models from torchvision, such as vgg16, alexnet etc.\n", + "\n", + "(`layers` = -1) Number of layers, i.e. after which layer to cut the original network. More information in the next blocks.\n", + "\n", + "(`pretrained` = True) Whether to use a randomly initialized or pretrained network\n", + "\n", + "(`fine_tune` = False) Whether to clip gradients before this core or to allow training on the core" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [], + "source": [ + "TransferLearning_vgg16_core_config = {\n", + " # core args\n", + " 'input_channels':1,\n", + " 'tl_model_name':'vgg16',\n", + " 'layers':-1,\n", + " 'pretrained':True,\n", + " 'fine_tune':False\n", + "}" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Try loading pretrained VGG16" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [], + "source": [ + "from neuralpredictors.layers.cores import TransferLearningCore\n", + "\n", + "transfer_learning_core = TransferLearningCore(**TransferLearning_vgg16_core_config)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To understand the `layers` parameter for the TransferLearning core, we first set (`layers` = -1) to transfer all the layers of the network.\n", + "\n", + "By printing the sequential layers of the transferred network we can see the model architecture." + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "TransferLearningCore(\n", + " (features): Sequential(\n", + " (TransferLearning): Sequential(\n", + " (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n", + " (1): ReLU(inplace=True)\n", + " (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n", + " (3): ReLU(inplace=True)\n", + " (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n", + " (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n", + " (6): ReLU(inplace=True)\n", + " (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n", + " (8): ReLU(inplace=True)\n", + " (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n", + " (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n", + " (11): ReLU(inplace=True)\n", + " (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n", + " (13): ReLU(inplace=True)\n", + " (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n", + " (15): ReLU(inplace=True)\n", + " (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n", + " (17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n", + " (18): ReLU(inplace=True)\n", + " (19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n", + " (20): ReLU(inplace=True)\n", + " (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n", + " (22): ReLU(inplace=True)\n", + " (23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n", + " (24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n", + " (25): ReLU(inplace=True)\n", + " (26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n", + " (27): ReLU(inplace=True)\n", + " (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n", + " (29): ReLU(inplace=True)\n", + " )\n", + " (OutBatchNorm): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", + " (OutNonlin): ReLU(inplace=True)\n", + " )\n", + ") [TransferLearningCore regularizers: ]\n", + "\n" + ] + } + ], + "source": [ + "print(transfer_learning_core)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's say we want to use the VGG16 model up to layer 12. Then we can change the `layers` configuration parameter in **TransferLearning_vgg16_core_config** and then re-instantiate the core." + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [], + "source": [ + "TransferLearning_vgg16_core_config['layers'] = 12" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "torch.Size([32, 256, 36, 64])" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from neuralpredictors.layers.cores import TransferLearningCore\n", + "\n", + "transfer_learning_core = TransferLearningCore(**TransferLearning_vgg16_core_config)\n", + "transfer_learning_core_out = transfer_learning_core(images)\n", + "transfer_learning_core_out.shape" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "---" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In conclusion, we have created instances of core models, such as **model_stacked2dcore, model_transfer_learning etc.**. After we have obtained core outpus, such as **stacked2dcore_out, transfer_learning_core_out** we can pass this as input to the readout module. " + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "---" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": { + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "### References\n", + "- Code for the [model function](https://github.com/sinzlab/sensorium/blob/8660c0c925b3944e723637db4725083f84ee28c3/sensorium/models/models.py#L17)\n", + "- Code for the [core Module](https://github.com/sinzlab/neuralpredictors/blob/0d3d793cc0e1f55ec61c5f9f7a98318b5241a2e9/neuralpredictors/layers/cores/conv2d.py#L27)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.0" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +}