Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification on Target Matrix for Embedding Learning #4

Open
sarazatezalo opened this issue Nov 25, 2024 · 3 comments
Open

Clarification on Target Matrix for Embedding Learning #4

sarazatezalo opened this issue Nov 25, 2024 · 3 comments

Comments

@sarazatezalo
Copy link

Hello,

Could you clarify how the target matrix for embedding learning is defined and used?
Namely, in the paper, it is stated that (paraphrased):

The projection layers are learned to maximize the cosine similarity of the image/text features of the $n$ pairs sharing the same disease while minimizing the cosine similarity of embeddings of the pairs from different diseases. For this, we define a target matrix as having ones on image-label pairs sharing the same disease label, and zeros in the remaining pairs.

However, in the code for CLIP/modules/model.py, it seems that the logic for the derm7pt dataset is reversed. Specifically:

matrix = []
        for l in batch["label"]:
            # derm7pt
            if CFG.dataset == "derm7pt":
                if l == 0:
                    matrix.append(batch["label"].cpu().numpy())
                else:
                    matrix.append(np.where(batch["label"].cpu().numpy() >= 1, 0, 1))
            elif CFG.dataset == "ISIC_2018":
                matrix.append(np.where(batch["label"].cpu().numpy() != l.cpu().numpy(), 0., 1.))

If the goal is to create a matrix with ones for pairs sharing the same disease, shouldn't the logic for derm7pt handle the labels differently?

Thank you in advance for your clarification!

@CristianoPatricio
Copy link
Owner

Hi @sarazatezalo,

As set in the CLIP/modules/config.py file, the Derm7pt dataset has 2 classes: Melanoma (1) and Nevus (0) while the ISIC 2018 is defined as a multi class dataset, with 7 skin lesion classes. If I remember, the way I handle the target matrix takes this difference into account. Try to print the target matrix to see the values.

Thank you!

@sarazatezalo
Copy link
Author

Hi @CristianoPatricio ,

Thank you for your response and for sharing your work—it’s very inspiring! I’m currently researching this topic for my university project and I am interested in implementing something similar.
However, I’m having some difficulty understanding the loss calculation in the code for derm7pt, particularly with how the target matrix is used. As I understand it, the loss function in use is cross-entropy. When provided with logits and a target in the form of a matrix (instead of class indices), PyTorch applies a row-wise softmax to the logits and computes the loss based on the positions indicated by 1 in the target matrix. Specifically, for this case where the targets are zeros and ones, select and sum the predicted probabilities corresponding to 1s in the target matrix.
In your code, it looks like the target matrix, for the derm7pt dataset, has a value of 1 for text-image embeddings that do not share the same disease label.
Could you clarify this part of the implementation? How does this align with the cross-entropy loss computation, and what was the rationale behind this approach?
Thank you again for your time.

Best regards,

Sara

@CristianoPatricio
Copy link
Owner

Hi @sarazatezalo,

Sorry for the delay. Your rationale is correct. Did you try to modify the target matrix for the Derm7pt, according to your rationale?
Anyway, I'll check the implementation ASAP.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants