-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some Questions #5
Comments
I am not the author and I hope my answer can help you. I hope the above opinion may help u. |
Hi @yanjk3, Thank you very much for the answers, I really appreciate it. It makes more sense now that I know the authors made a slight modification to the original DINO |
Hi @yanjk3, When I use eval_knn.py from original dino to evaluate selfpatch, it says: size mismatch for pos_embed: copying a param with shape torch.Size([1, 196, 384]) from checkpoint, the shape in current model is torch.Size([1, 197, 384]). Do you have any ideas on how can I fix it? Thank you |
This is because the selfpatch checkpoint does not contain the CLS token. Therefore, the position embedding's size is mismatched. In selfpatch, the CLS token is in the SelfPatchHead https://github.com/alinlab/SelfPatch/blob/main/selfpatch_vision_transformer.py#L362, so the ViT backbone does not need the CLS token. I think you can fix it by modifying the dino's ViT codes https://github.com/facebookresearch/dino/blob/main/vision_transformer.py#L147 from self.pos_embed = nn.Parameter(torch.zeros(1, num_patches + 1, embed_dim)) to self.pos_embed = nn.Parameter(torch.zeros(1, num_patches, embed_dim)). |
Hi @yanjk3, thank you for your answers. Could you demonstrate how I can use a global avg pooling on the last transformer blocks? |
You should make sure you delete the CLS token in the ViT first. And then, you can insert |
Hi @yanjk3, I already took your advice, but it appears that the accuracy is 3% less than it was for the original DINO under the same settings for eval knn.py. What solutions do you have for this? How can accuracy be checked more accurately? Is it better to check from eval linear.py or eval knn.py? Thanks |
To overcome the performance drops, I recommend copying the SelfPatch ViT to the Dino ViT.
If you use this CLS token, the performance may be improved. |
Hi @yanjk3, sorry I don't really get it. What do you mean by copying SelfPatch VIT to DINO VIT? |
I mean you should replace the dino vit model's code with selfpatch vit model's code. |
Hi @yanjk3, do you mean adding everything you previously suggested to the code for the Dino Vit Model (vision transformer.py)? |
Hi @bryanwong17, @yanjk3 . I'm having the same problem as yours. I cannot do the evaluation using eval_knn.py. |
Hello, I would appreciate it if you could respond to some of my questions below:
student_output = [student(torch.cat(images[:2]), head_only=True, loc=False), student(torch.cat(images[2:]), head_only=True, loc=False)]
Thanks for your time and kindness!
The text was updated successfully, but these errors were encountered: