Skip to content

State-of-the-art CLIP/SigLIP embedding models finetuned for the fashion domain. +57% increase in evaluation metrics vs FashionCLIP 2.0.

License

Notifications You must be signed in to change notification settings

marqo-ai/marqo-FashionCLIP

Repository files navigation

Marqo-FashionCLIP

This repository is designed to evaluate Marqo-FashionCLIP and Marqo-FashionSigLIP across seven public benchmark datasets. Read more about the models on our blog.

Benchmark Results

We averaged the performance of three common tasks across the datasets: text-to-image, category-to-product, and sub-category-to-product. As demonstrated below, Marqo-FashionCLIP and Marqo-FashionSigLIP outperform both pretrained OpenCLIP models and the state-of-the-art fashion CLIP models. For a more comprehensive performance comparison, refer to the LEADERBOARD.

Text-To-Image (Averaged across 6 datasets)

Model AvgRecall Recall@1 Recall@10 MRR
Marqo-FashionSigLIP 0.231 0.121 0.340 0.239
Marqo-FashionCLIP 0.192 0.094 0.290 0.200
FashionCLIP2.0 0.163 0.077 0.249 0.165
OpenFashionCLIP 0.132 0.060 0.204 0.135
ViT-B-16-laion2b_s34b_b88k 0.174 0.088 0.261 0.180
ViT-B-16-SigLIP-webli 0.212 0.111 0.314 0.214

Category-To-Product (Averaged across 5 datasets)

Model AvgP P@1 P@10 MRR
Marqo-FashionSigLIP 0.737 0.758 0.716 0.812
Marqo-FashionCLIP 0.705 0.734 0.676 0.776
FashionCLIP2.0 0.684 0.681 0.686 0.741
OpenFashionCLIP 0.646 0.653 0.639 0.720
ViT-B-16-laion2b_s34b_b88k 0.662 0.673 0.652 0.743
ViT-B-16-SigLIP-webli 0.688 0.690 0.685 0.751

Sub-Category-To-Product (Averaged across 4 datasets)

Model AvgP P@1 P@10 MRR
Marqo-FashionSigLIP 0.725 0.767 0.683 0.811
Marqo-FashionCLIP 0.707 0.747 0.667 0.772
FashionCLIP2.0 0.657 0.676 0.638 0.733
OpenFashionCLIP 0.598 0.619 0.578 0.689
ViT-B-16-laion2b_s34b_b88k 0.638 0.651 0.624 0.712
ViT-B-16-SigLIP-webli 0.643 0.643 0.643 0.726

Models

Hugging Face

We released our models on HuggingFace: Marqo-FashionCLIP and Marqo-FashionSigLIP. We also have a Hugging Face Space Demo of our models in action: Classification with Marqo-FashionSigLIP.

You can load the models with transformers by

from transformers import AutoModel, AutoProcessor
model = AutoModel.from_pretrained('Marqo/marqo-fashionCLIP', trust_remote_code=True)
processor = AutoProcessor.from_pretrained('Marqo/marqo-fashionCLIP', trust_remote_code=True)

and

from transformers import AutoModel, AutoProcessor
model = AutoModel.from_pretrained('Marqo/marqo-fashionSigLIP', trust_remote_code=True)
processor = AutoProcessor.from_pretrained('Marqo/marqo-fashionSigLIP', trust_remote_code=True)

Then,

import torch
from PIL import Image

image = [Image.open("docs/fashion-hippo.png")]
text = ["a hat", "a t-shirt", "shoes"]
processed = processor(text=text, images=image, padding='max_length', return_tensors="pt")

with torch.no_grad():
    image_features = model.get_image_features(processed['pixel_values'], normalize=True)
    text_features = model.get_text_features(processed['input_ids'], normalize=True)

    text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)

print("Label probs:", text_probs)

We released this article illustrating a simple ecommerce search with a fashion dataset if you want to see the model in action.

OpenCLIP

You can load the models with open_clip by

import open_clip
model, preprocess_train, preprocess_val = open_clip.create_model_and_transforms('hf-hub:Marqo/marqo-fashionCLIP')
tokenizer = open_clip.get_tokenizer('hf-hub:Marqo/marqo-fashionCLIP')

and

import open_clip
model, preprocess_train, preprocess_val = open_clip.create_model_and_transforms('hf-hub:Marqo/marqo-fashionSigLIP')
tokenizer = open_clip.get_tokenizer('hf-hub:Marqo/marqo-fashionSigLIP')

Then,

import torch
from PIL import Image

image = preprocess_val(Image.open("docs/fashion-hippo.png")).unsqueeze(0)
text = tokenizer(["a hat", "a t-shirt", "shoes"])

with torch.no_grad(), torch.cuda.amp.autocast():
    image_features = model.encode_image(image, normalize=True)
    text_features = model.encode_text(text, normalize=True)

    text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)

print("Label probs:", text_probs)

Marqo

To deploy on Marqo Cloud (recommended):

  1. Sign Up to Marqo Cloud.

  2. Install Marqo and the Marqo python client:

pip install marqo
  1. Create and index:
import marqo

settings = {
    "type": "unstructured",
    "model": "marqo-fashion-clip", # model name
    "modelProperties": {
        "name": "ViT-B-16", # model architecture
        "dimensions": 512, # embedding dimensions
        "url": "https://marqo-gcl-public.s3.us-west-2.amazonaws.com/marqo-fashionCLIP/marqo_fashionCLIP.pt", # model weights
        "type": "open_clip" # loading library
    },
}

api_key = "your_api_key"  # replace with your api key (https://www.marqo.ai/blog/finding-my-marqo-api-key)
mq = marqo.Client("https://api.marqo.ai", api_key=api_key)

mq.create_index("fashion-index", settings_dict=settings)

# triggers model download
mq.index("fashion-index").search("black dress")

See the full documentation for more details on adding documents and searching.

Quick Start

Install PyTorch first and run

pip install -r requirements.txt

To evaluate Marqo-FashionCLIP, run this command

python eval.py \
        --dataset-config ./configs/${DATASET}.json \
        --model-name Marqo/marqo-fashionCLIP \
        --run-name Marqo-FashionCLIP
  • DATASET can be one of ['deepfashion_inshop', 'deepfashion_multimodal', 'fashion200k', 'KAGL', 'atlas', 'polyvore' 'iMaterialist']

To evaluate Marqo-FashionSigLIP, run this command

python eval.py \
        --dataset-config ./configs/${DATASET}.json \
        --model-name Marqo/marqo-fashionSigLIP \
        --run-name Marqo-FashionSigLIP
  • DATASET can be one of ['deepfashion_inshop', 'deepfashion_multimodal', 'fashion200k', 'KAGL', 'atlas', 'polyvore' 'iMaterialist']

Scripts to evaluate other models including FashionCLIP 2.0 and OpenFashionCLIP can be found in scripts directory.

Datasets

We collected 7 public multimodal fashion datasets and uploaded to HuggingFace: Atlas, DeepFashion (In-shop), DeepFashion (Multimodal), Fashion200k, iMaterialist, KAGL, and Polyvore. Each dataset has different metadata available. Thus, tasks for each dataset are stored as json files in scripts directory. Refer to our blog for more information about each dataset.

Summarizing Results

To renew LEADERBOARD.md and summarize results of different models locally, run this command

python summarize_results.py

Citation

@software{Jung_Marqo-FashionCLIP_and_Marqo-FashionSigLIP_2024,
author = {Jung, Myong Chol and Clark, Jesse},
month = aug,
title = {{Marqo-FashionCLIP and Marqo-FashionSigLIP}},
url = {https://github.com/marqo-ai/marqo-FashionCLIP},
version = {1.0.0},
year = {2024}
}