Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Window Size is the same as the Image Size? #51

Open
siddagra opened this issue Dec 21, 2024 · 1 comment
Open

Window Size is the same as the Image Size? #51

siddagra opened this issue Dec 21, 2024 · 1 comment

Comments

@siddagra
Copy link

In the only layers that use self-attention blocks, which is the last two layers, you have set the window size to be equal to the Spatial Size, doesn't that mean that you are not really even computing self-attention and that there is only a single token?

Please correct me if I am wrong as this seems perplexing.

@ahatamiz
Copy link
Collaborator

Hi @siddagra

In stage 3, we have:

Feature map size is 14 x 14
Window size is also 14
Since window size equals feature map size, this means we're doing global self-attention across the entire feature map

Therefore:

Number of tokens in stage 3 >>> 14 x 14 = 196 tokens

Basically each position in the 14 x 14 feature map becomes a token.

This way of computing attention is similar to ViT-style attention without local windows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants