Clarification on Target Matrix for Embedding Learning #4

sarazatezalo · 2024-11-25T11:14:54Z

Hello,

Could you clarify how the target matrix for embedding learning is defined and used?
Namely, in the paper, it is stated that (paraphrased):

The projection layers are learned to maximize the cosine similarity of the image/text features of the $n$ pairs sharing the same disease while minimizing the cosine similarity of embeddings of the pairs from different diseases. For this, we define a target matrix as having ones on image-label pairs sharing the same disease label, and zeros in the remaining pairs.

However, in the code for CLIP/modules/model.py, it seems that the logic for the derm7pt dataset is reversed. Specifically:

matrix = []
        for l in batch["label"]:
            # derm7pt
            if CFG.dataset == "derm7pt":
                if l == 0:
                    matrix.append(batch["label"].cpu().numpy())
                else:
                    matrix.append(np.where(batch["label"].cpu().numpy() >= 1, 0, 1))
            elif CFG.dataset == "ISIC_2018":
                matrix.append(np.where(batch["label"].cpu().numpy() != l.cpu().numpy(), 0., 1.))

If the goal is to create a matrix with ones for pairs sharing the same disease, shouldn't the logic for derm7pt handle the labels differently?

Thank you in advance for your clarification!

The text was updated successfully, but these errors were encountered:

CristianoPatricio · 2024-11-25T16:43:10Z

Hi @sarazatezalo,

As set in the CLIP/modules/config.py file, the Derm7pt dataset has 2 classes: Melanoma (1) and Nevus (0) while the ISIC 2018 is defined as a multi class dataset, with 7 skin lesion classes. If I remember, the way I handle the target matrix takes this difference into account. Try to print the target matrix to see the values.

Thank you!

sarazatezalo · 2024-12-05T15:25:26Z

Hi @CristianoPatricio ,

Thank you for your response and for sharing your work—it’s very inspiring! I’m currently researching this topic for my university project and I am interested in implementing something similar.
However, I’m having some difficulty understanding the loss calculation in the code for derm7pt, particularly with how the target matrix is used. As I understand it, the loss function in use is cross-entropy. When provided with logits and a target in the form of a matrix (instead of class indices), PyTorch applies a row-wise softmax to the logits and computes the loss based on the positions indicated by 1 in the target matrix. Specifically, for this case where the targets are zeros and ones, select and sum the predicted probabilities corresponding to 1s in the target matrix.
In your code, it looks like the target matrix, for the derm7pt dataset, has a value of 1 for text-image embeddings that do not share the same disease label.
Could you clarify this part of the implementation? How does this align with the cross-entropy loss computation, and what was the rationale behind this approach?
Thank you again for your time.

Best regards, 
Sara

CristianoPatricio · 2025-01-13T11:50:54Z

Hi @sarazatezalo,

Sorry for the delay. Your rationale is correct. Did you try to modify the target matrix for the Derm7pt, according to your rationale?
Anyway, I'll check the implementation ASAP.

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification on Target Matrix for Embedding Learning #4

Clarification on Target Matrix for Embedding Learning #4

sarazatezalo commented Nov 25, 2024

CristianoPatricio commented Nov 25, 2024

sarazatezalo commented Dec 5, 2024

CristianoPatricio commented Jan 13, 2025

Clarification on Target Matrix for Embedding Learning #4

Clarification on Target Matrix for Embedding Learning #4

Comments

sarazatezalo commented Nov 25, 2024

CristianoPatricio commented Nov 25, 2024

sarazatezalo commented Dec 5, 2024

CristianoPatricio commented Jan 13, 2025