Some Questions #5

bryanwong17 · 2022-11-23T23:59:48Z

Hello, I would appreciate it if you could respond to some of my questions below:

Could you clarify why SelfPatch loss is only evaluated when v == iq (student and teacher operate on the same view)?
Why is teacher output viewed locally as well? According to what I understand of the original DINO, only 2 global views pass through the teacher
Why is loc=False given to student_output?

student_output = [student(torch.cat(images[:2]), head_only=True, loc=False), student(torch.cat(images[2:]), head_only=True, loc=False)]

Thanks for your time and kindness!

yanjk3 · 2022-11-25T18:50:09Z

I am not the author and I hope my answer can help you.
A1: because the neighbor of a patch is defined in the same view. The ''neighbor'' is not easy to define in a cross-view situation. (or you can try to define it with some spatial priori)
A2: the local views are fed into the teacher network to contribute to the selfpatch loss, i.e., the loss from the same view mentioned before, which may not be a must and may accelerate the convergence.
A3: ''loc=True'' means aggregating the neighbor's features, which is enabled in the teacher network. E.g., the i^th patch of the teacher network aggregates its neighbor's features. In the student model, we do not aggregate them. Then, we maximize the similarity between the student's i^th patch and the teacher's i^th patch (it includes the neighbor's features) to model the patch-level representations.

I hope the above opinion may help u.

bryanwong17 · 2022-11-26T00:00:47Z

Hi @yanjk3, Thank you very much for the answers, I really appreciate it. It makes more sense now that I know the authors made a slight modification to the original DINO

bryanwong17 · 2022-12-29T23:17:21Z

Hi @yanjk3, When I use eval_knn.py from original dino to evaluate selfpatch, it says:

size mismatch for pos_embed: copying a param with shape torch.Size([1, 196, 384]) from checkpoint, the shape in current model is torch.Size([1, 197, 384]).

Do you have any ideas on how can I fix it? Thank you

yanjk3 · 2022-12-30T00:02:50Z

This is because the selfpatch checkpoint does not contain the CLS token. Therefore, the position embedding's size is mismatched. In selfpatch, the CLS token is in the SelfPatchHead https://github.com/alinlab/SelfPatch/blob/main/selfpatch_vision_transformer.py#L362, so the ViT backbone does not need the CLS token.

I think you can fix it by modifying the dino's ViT codes https://github.com/facebookresearch/dino/blob/main/vision_transformer.py#L147 from self.pos_embed = nn.Parameter(torch.zeros(1, num_patches + 1, embed_dim)) to self.pos_embed = nn.Parameter(torch.zeros(1, num_patches, embed_dim)).
And then you should delete the '-1' in line 175 and line 176, and exchange line 202 and line 205.
However, as the selfpatch checkpoint does not contain the CLS token, the ViT model will randomly initialize a CLS token and lead to a potential performance drop. I think you can use a global average pooling on the last transformer block to get the global feature representation of images instead of using the CLS token.

bryanwong17 · 2022-12-30T06:03:30Z

Hi @yanjk3, thank you for your answers. Could you demonstrate how I can use a global avg pooling on the last transformer blocks?

yanjk3 · 2022-12-30T07:02:34Z

You should make sure you delete the CLS token in the ViT first. And then, you can insert
x = x.mean(dim=1)
after the
x = self.norm(x)
and then return the x

bryanwong17 · 2022-12-30T07:56:21Z

Hi @yanjk3, I already took your advice, but it appears that the accuracy is 3% less than it was for the original DINO under the same settings for eval knn.py. What solutions do you have for this? How can accuracy be checked more accurately? Is it better to check from eval linear.py or eval knn.py? Thanks

yanjk3 · 2022-12-30T08:08:49Z

To overcome the performance drops, I recommend copying the SelfPatch ViT to the Dino ViT.
The main difference between them is:

SelfPatch uses the CA block after the ViT blocks to aggregate the global feature representations and output the CLS token.

If you use this CLS token, the performance may be improved.
But unfortunately, the released checkpoint only contains the ViT backbone. So if you want to get a precise answer, you should pre-train the entire model on your own.

bryanwong17 · 2022-12-30T08:15:16Z

Hi @yanjk3, sorry I don't really get it. What do you mean by copying SelfPatch VIT to DINO VIT?

yanjk3 · 2022-12-30T08:18:45Z

I mean you should replace the dino vit model's code with selfpatch vit model's code.

bryanwong17 · 2022-12-30T08:31:23Z

Hi @yanjk3, do you mean adding everything you previously suggested to the code for the Dino Vit Model (vision transformer.py)?

alijavidani · 2023-06-08T20:22:41Z

Hi @bryanwong17, @yanjk3 . I'm having the same problem as yours. I cannot do the evaluation using eval_knn.py.
I was wondering could you find a solution for this problem?
Thanks in advance.

bryanwong17 closed this as completed Nov 26, 2022

bryanwong17 reopened this Dec 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some Questions #5

Some Questions #5

bryanwong17 commented Nov 23, 2022 •

edited

Loading

yanjk3 commented Nov 25, 2022 •

edited

Loading

bryanwong17 commented Nov 26, 2022

bryanwong17 commented Dec 29, 2022

yanjk3 commented Dec 30, 2022 •

edited

Loading

bryanwong17 commented Dec 30, 2022 •

edited

Loading

yanjk3 commented Dec 30, 2022

bryanwong17 commented Dec 30, 2022

yanjk3 commented Dec 30, 2022

bryanwong17 commented Dec 30, 2022

yanjk3 commented Dec 30, 2022

bryanwong17 commented Dec 30, 2022

alijavidani commented Jun 8, 2023 •

edited

Loading

Some Questions #5

Some Questions #5

Comments

bryanwong17 commented Nov 23, 2022 • edited Loading

yanjk3 commented Nov 25, 2022 • edited Loading

bryanwong17 commented Nov 26, 2022

bryanwong17 commented Dec 29, 2022

yanjk3 commented Dec 30, 2022 • edited Loading

bryanwong17 commented Dec 30, 2022 • edited Loading

yanjk3 commented Dec 30, 2022

bryanwong17 commented Dec 30, 2022

yanjk3 commented Dec 30, 2022

bryanwong17 commented Dec 30, 2022

yanjk3 commented Dec 30, 2022

bryanwong17 commented Dec 30, 2022

alijavidani commented Jun 8, 2023 • edited Loading

bryanwong17 commented Nov 23, 2022 •

edited

Loading

yanjk3 commented Nov 25, 2022 •

edited

Loading

yanjk3 commented Dec 30, 2022 •

edited

Loading

bryanwong17 commented Dec 30, 2022 •

edited

Loading

alijavidani commented Jun 8, 2023 •

edited

Loading