Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InceptionTime model does not classify correctly when "residual=True" #29

Open
GenghisYoung233 opened this issue Jan 9, 2021 · 5 comments

Comments

@GenghisYoung233
Copy link

I tested all 7 model in ./breizhcrops/models with same datasets in 4 classes and processing it with same code, InceptionTime model classify them to only one class, but others six models works fine.

However, after changing InceptionTime.py, line 45

    def __init__(self, kernel_size=32, num_filters=128, residual=True, use_bias=False, device=torch.device("cpu")):

TO

    def __init__(self, kernel_size=32, num_filters=128, residual=False, use_bias=False, device=torch.device("cpu")):

the result seem reasonable, all 4 classes are classified with high precision.

Do I have to uncomment line 36~38 to use residual?

    #if self.use_residual and d % 3 == 2:
    #    x = self._shortcut_layer(input_res, x)
    #    input_res = x

The code I used to processing the datasets, fit the model and classify raster:
https://gist.github.com/GenghisYoung233/834909ced3531e57b8ec6e0353d17c27

@maxzoll
Copy link
Collaborator

maxzoll commented Mar 23, 2021

Hello GenghisYoung233,
regarding issue #19 and your issue #29, the code for InceptionTime.py has been reviewed and updated. The model now works as intended and also provides reasonable results while using the shortcut layer with "residual=True".

@charlotte-pel
Copy link
Collaborator

charlotte-pel commented Mar 31, 2021

Hello @maxzoll,

According to the following lines (16 and 17):

self.inception_modules_list = [InceptionModule(kernel_size=32, num_filters=hidden_dims*4,

The residual parameter of the InceptionModule is always set at this default value False. Moreover there is no use_residual parameter anymore in InceptionTime to allow its use.

To be closer from the original Keras implementation (https://github.com/hfawaz/InceptionTime), I suggest the following code:

import numpy as np
import time

from utils.utils import save_logs
from utils.utils import calculate_metrics
from utils.utils import save_test_duration

import torch.nn as nn
import torch.nn.functional as F
import torch.utils.data
import torch

__all__ = ['InceptionTime']

class InceptionTime(nn.Module):

    def __init__(self,num_classes, input_dim=1,num_layers=6, hidden_dims=128,use_bias=False, use_residual= True, device=torch.device("cpu")):
        super(InceptionTime, self).__init__()
        self.modelname = f"InceptionTime_input-dim={input_dim}_num-classes={num_classes}_" \
                         f"hidden-dims={hidden_dims}_num-layers={num_layers}"
        #self.inlinear = nn.Linear(input_dim, hidden_dims)
        self.num_layers = num_layers
        self.use_residual = use_residual
        #self.inception_modules_list = [InceptionModule(kernel_size=40, num_filters=hidden_dims,
                                                       #use_bias=use_bias, device=device) for _ in range(num_layers)]
        self.inception_modules_list = [InceptionModule(input_dim = input_dim, kernel_size=40, num_filters=hidden_dims//4,
                                                       use_bias=use_bias, device=device)]
        for i in range(num_layers-1):
          self.inception_modules_list.append(InceptionModule(input_dim = hidden_dims, kernel_size=40, num_filters=hidden_dims//4,
                                                       use_bias=use_bias, device=device))
        #self.inception_modules = nn.Sequential(
            #*self.inception_modules_list
        #)
        self.shortcut_layer_list = [ShortcutLayer(input_dim,hidden_dims,stride = 1, bias = False)]
        for i in range(num_layers//3):
          self.shortcut_layer_list.append(ShortcutLayer(hidden_dims,hidden_dims,stride = 1, bias = False))
        self.avgpool = nn.AdaptiveAvgPool1d(1)
        self.outlinear = nn.Linear(hidden_dims,num_classes)

        self.to(device)

    def forward(self,x):
        # N x T x D -> N x D x T
        x = x.transpose(1,2)
        input_res = x
        

        # expand dimensions
        #x = self.inlinear(x.transpose(1, 2)).transpose(1, 2)
        for d in range(self.num_layers):
            x = self.inception_modules_list[d](x)

            if self.use_residual and d % 3 == 2:
                x = self.shortcut_layer_list[d//3](input_res, x)
                input_res = x
        x = self.avgpool(x).squeeze(2)
        x = self.outlinear(x)
        logprobabilities = F.log_softmax(x, dim=-1)
        return logprobabilities

class InceptionModule(nn.Module):
    def __init__(self,input_dim=32, kernel_size=40, num_filters= 32, residual=False, use_bias=False, device=torch.device("cpu")):
        super(InceptionModule, self).__init__()

        self.residual = residual

        self.bottleneck = nn.Conv1d(input_dim, num_filters , kernel_size = 1, stride=1, padding= 0,bias=use_bias)
        
        # the for loop gives 40, 20, and 10 convolutions
        kernel_size_s = [kernel_size // (2 ** i) for i in range(3)]
        self.convolutions = [nn.Conv1d(num_filters, num_filters, kernel_size=kernel_size+1, stride=1, bias= False, padding=kernel_size//2).to(device) for kernel_size in kernel_size_s] #padding is 1 instead of kernel_size//2
        
        self.pool_conv = nn.Sequential(
            nn.MaxPool1d(kernel_size=3, stride=1, padding=1),
            nn.Conv1d(input_dim, num_filters,kernel_size=1, stride = 1,padding=0, bias=use_bias) 
        )

        self.bn_relu = nn.Sequential(
            nn.BatchNorm1d(num_filters*4),
            nn.ReLU()
        )

        #if residual:
            #self.residual_relu = nn.ReLU()
            #self.shortcut_layer = ShortcutLayer(num_filters, num_filters, stride =1, bias=False)

        self.to(device)


    def forward(self, input_tensor):
        # collapse feature dimension

        input_inception = self.bottleneck(input_tensor)
        features = [conv(input_inception) for conv in self.convolutions]
        features.append(self.pool_conv(input_tensor.contiguous()))
        features = torch.cat(features, dim=1) 
        features = self.bn_relu(features)
        #if self.residual:
            #features = features + input_tensor
            #features = self.shortcut_layer(input_tensor, out_tensor=features)
            
        return features

class ShortcutLayer(nn.Module):
    def __init__(self, in_planes, out_planes, stride, bias):
        super(ShortcutLayer, self).__init__()
        self.sc = nn.Sequential(nn.Conv1d(in_channels=in_planes,
                                          out_channels=out_planes,
                                          kernel_size=1,
                                          stride=stride,
                                          bias=bias),
                                nn.BatchNorm1d(num_features=out_planes))
        self.relu = nn.ReLU()

    def forward(self, input_tensor, out_tensor):
        x = out_tensor + self.sc(input_tensor)
        x = self.relu(x)

        return x        

I correct the use of residual connection, use a convolution for the bottleneck layer and remove the use of an initial linear layer.

Best regards,
Charlotte.

(PS: I can push this modification ;))

@MarcCoru
Copy link
Collaborator

MarcCoru commented Mar 31, 2021

Hey, thanks @charlotte-pel .

ok I understand. So we should also change the default behavior of residual to false? (as we had before, but without any real implementation for residual=True)

Also, from looking over the code: It seems that we shouldn't use the shortcut layer at every layer (as in the current implementation) just when d % 3 == 2 is true (<- adding at every third layer?).

Maybe we can move more tweaks in the implementation into a pull request rather than an issue. May be easier to compare/test and run codes there.

Edit:
I just checked the Fawaz et al., 2020 paper (page 6 top)

Each residual block’s input is transferred via a shortcut linear connection to be added to the next block’s input,

I guess the confusion originates from the different descriptions in the paper (every layer shortcutted) with the implementation (every third layer shortcutted). I suppose we stay with the implementation rather than the paper description?

@charlotte-pel
Copy link
Collaborator

Hi @MarcCoru,

I would let a default behavior to residual=True as it is use by default in InceptionTime.

Regarding the residual connection, you are correct: there is a shortcut every third layer. Sorry the paper is not that clear, the definition of a residual block is given earlier p5-6

The composition of an Inception network classifier contains two different residual blocks [...] For the Inception network, each block is comprised of three Inception modules

So every residual block is shortcutted, but a residual block is composed of three Inception modules. Hope it makes more sens now.

I will do the pull request.

All the best,
Charlotte.

@MarcCoru
Copy link
Collaborator

MarcCoru commented Apr 6, 2021

We pulled in @charlotte-pel's code in a different pull request #31 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants