InceptionTime model does not classify correctly when "residual=True" #29

GenghisYoung233 · 2021-01-09T06:05:10Z

I tested all 7 model in ./breizhcrops/models with same datasets in 4 classes and processing it with same code, InceptionTime model classify them to only one class, but others six models works fine.

However, after changing InceptionTime.py, line 45

    def __init__(self, kernel_size=32, num_filters=128, residual=True, use_bias=False, device=torch.device("cpu")):

TO

    def __init__(self, kernel_size=32, num_filters=128, residual=False, use_bias=False, device=torch.device("cpu")):

the result seem reasonable, all 4 classes are classified with high precision.

Do I have to uncomment line 36~38 to use residual?

    #if self.use_residual and d % 3 == 2:
    #    x = self._shortcut_layer(input_res, x)
    #    input_res = x

The code I used to processing the datasets, fit the model and classify raster:
https://gist.github.com/GenghisYoung233/834909ced3531e57b8ec6e0353d17c27

maxzoll · 2021-03-23T07:31:05Z

Hello GenghisYoung233,
regarding issue #19 and your issue #29, the code for InceptionTime.py has been reviewed and updated. The model now works as intended and also provides reasonable results while using the shortcut layer with "residual=True".

charlotte-pel · 2021-03-31T16:26:21Z

Hello @maxzoll,

According to the following lines (16 and 17):

BreizhCrops/breizhcrops/models/InceptionTime.py

Line 16 in a660908

    
           self.inception_modules_list = [InceptionModule(kernel_size=32, num_filters=hidden_dims*4,

The residual parameter of the InceptionModule is always set at this default value False. Moreover there is no use_residual parameter anymore in InceptionTime to allow its use.

To be closer from the original Keras implementation (https://github.com/hfawaz/InceptionTime), I suggest the following code:

import numpy as np
import time

from utils.utils import save_logs
from utils.utils import calculate_metrics
from utils.utils import save_test_duration

import torch.nn as nn
import torch.nn.functional as F
import torch.utils.data
import torch

__all__ = ['InceptionTime']

class InceptionTime(nn.Module):

    def __init__(self,num_classes, input_dim=1,num_layers=6, hidden_dims=128,use_bias=False, use_residual= True, device=torch.device("cpu")):
        super(InceptionTime, self).__init__()
        self.modelname = f"InceptionTime_input-dim={input_dim}_num-classes={num_classes}_" \
                         f"hidden-dims={hidden_dims}_num-layers={num_layers}"
        #self.inlinear = nn.Linear(input_dim, hidden_dims)
        self.num_layers = num_layers
        self.use_residual = use_residual
        #self.inception_modules_list = [InceptionModule(kernel_size=40, num_filters=hidden_dims,
                                                       #use_bias=use_bias, device=device) for _ in range(num_layers)]
        self.inception_modules_list = [InceptionModule(input_dim = input_dim, kernel_size=40, num_filters=hidden_dims//4,
                                                       use_bias=use_bias, device=device)]
        for i in range(num_layers-1):
          self.inception_modules_list.append(InceptionModule(input_dim = hidden_dims, kernel_size=40, num_filters=hidden_dims//4,
                                                       use_bias=use_bias, device=device))
        #self.inception_modules = nn.Sequential(
            #*self.inception_modules_list
        #)
        self.shortcut_layer_list = [ShortcutLayer(input_dim,hidden_dims,stride = 1, bias = False)]
        for i in range(num_layers//3):
          self.shortcut_layer_list.append(ShortcutLayer(hidden_dims,hidden_dims,stride = 1, bias = False))
        self.avgpool = nn.AdaptiveAvgPool1d(1)
        self.outlinear = nn.Linear(hidden_dims,num_classes)

        self.to(device)

    def forward(self,x):
        # N x T x D -> N x D x T
        x = x.transpose(1,2)
        input_res = x
        

        # expand dimensions
        #x = self.inlinear(x.transpose(1, 2)).transpose(1, 2)
        for d in range(self.num_layers):
            x = self.inception_modules_list[d](x)

            if self.use_residual and d % 3 == 2:
                x = self.shortcut_layer_list[d//3](input_res, x)
                input_res = x
        x = self.avgpool(x).squeeze(2)
        x = self.outlinear(x)
        logprobabilities = F.log_softmax(x, dim=-1)
        return logprobabilities

class InceptionModule(nn.Module):
    def __init__(self,input_dim=32, kernel_size=40, num_filters= 32, residual=False, use_bias=False, device=torch.device("cpu")):
        super(InceptionModule, self).__init__()

        self.residual = residual

        self.bottleneck = nn.Conv1d(input_dim, num_filters , kernel_size = 1, stride=1, padding= 0,bias=use_bias)
        
        # the for loop gives 40, 20, and 10 convolutions
        kernel_size_s = [kernel_size // (2 ** i) for i in range(3)]
        self.convolutions = [nn.Conv1d(num_filters, num_filters, kernel_size=kernel_size+1, stride=1, bias= False, padding=kernel_size//2).to(device) for kernel_size in kernel_size_s] #padding is 1 instead of kernel_size//2
        
        self.pool_conv = nn.Sequential(
            nn.MaxPool1d(kernel_size=3, stride=1, padding=1),
            nn.Conv1d(input_dim, num_filters,kernel_size=1, stride = 1,padding=0, bias=use_bias) 
        )

        self.bn_relu = nn.Sequential(
            nn.BatchNorm1d(num_filters*4),
            nn.ReLU()
        )

        #if residual:
            #self.residual_relu = nn.ReLU()
            #self.shortcut_layer = ShortcutLayer(num_filters, num_filters, stride =1, bias=False)

        self.to(device)


    def forward(self, input_tensor):
        # collapse feature dimension

        input_inception = self.bottleneck(input_tensor)
        features = [conv(input_inception) for conv in self.convolutions]
        features.append(self.pool_conv(input_tensor.contiguous()))
        features = torch.cat(features, dim=1) 
        features = self.bn_relu(features)
        #if self.residual:
            #features = features + input_tensor
            #features = self.shortcut_layer(input_tensor, out_tensor=features)
            
        return features

class ShortcutLayer(nn.Module):
    def __init__(self, in_planes, out_planes, stride, bias):
        super(ShortcutLayer, self).__init__()
        self.sc = nn.Sequential(nn.Conv1d(in_channels=in_planes,
                                          out_channels=out_planes,
                                          kernel_size=1,
                                          stride=stride,
                                          bias=bias),
                                nn.BatchNorm1d(num_features=out_planes))
        self.relu = nn.ReLU()

    def forward(self, input_tensor, out_tensor):
        x = out_tensor + self.sc(input_tensor)
        x = self.relu(x)

        return x

I correct the use of residual connection, use a convolution for the bottleneck layer and remove the use of an initial linear layer.

Best regards,
Charlotte.

(PS: I can push this modification ;))

MarcCoru · 2021-03-31T17:09:13Z

Hey, thanks @charlotte-pel .

ok I understand. So we should also change the default behavior of residual to false? (as we had before, but without any real implementation for residual=True)

Also, from looking over the code: It seems that we shouldn't use the shortcut layer at every layer (as in the current implementation) just when d % 3 == 2 is true (<- adding at every third layer?).

Maybe we can move more tweaks in the implementation into a pull request rather than an issue. May be easier to compare/test and run codes there.

Edit:
I just checked the Fawaz et al., 2020 paper (page 6 top)

Each residual block’s input is transferred via a shortcut linear connection to be added to the next block’s input,

I guess the confusion originates from the different descriptions in the paper (every layer shortcutted) with the implementation (every third layer shortcutted). I suppose we stay with the implementation rather than the paper description?

charlotte-pel · 2021-04-01T08:47:48Z

Hi @MarcCoru,

I would let a default behavior to residual=True as it is use by default in InceptionTime.

Regarding the residual connection, you are correct: there is a shortcut every third layer. Sorry the paper is not that clear, the definition of a residual block is given earlier p5-6

The composition of an Inception network classifier contains two different residual blocks [...] For the Inception network, each block is comprised of three Inception modules

So every residual block is shortcutted, but a residual block is composed of three Inception modules. Hope it makes more sens now.

I will do the pull request.

All the best,
Charlotte.

MarcCoru · 2021-04-06T19:08:48Z

We pulled in @charlotte-pel's code in a different pull request #31 .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

InceptionTime model does not classify correctly when "residual=True" #29

InceptionTime model does not classify correctly when "residual=True" #29

GenghisYoung233 commented Jan 9, 2021

maxzoll commented Mar 23, 2021

charlotte-pel commented Mar 31, 2021 •

edited

Loading

MarcCoru commented Mar 31, 2021 •

edited

Loading

charlotte-pel commented Apr 1, 2021

MarcCoru commented Apr 6, 2021

InceptionTime model does not classify correctly when "residual=True" #29

InceptionTime model does not classify correctly when "residual=True" #29

Comments

GenghisYoung233 commented Jan 9, 2021

maxzoll commented Mar 23, 2021

charlotte-pel commented Mar 31, 2021 • edited Loading

MarcCoru commented Mar 31, 2021 • edited Loading

charlotte-pel commented Apr 1, 2021

MarcCoru commented Apr 6, 2021

charlotte-pel commented Mar 31, 2021 •

edited

Loading

MarcCoru commented Mar 31, 2021 •

edited

Loading