This repository is designed to evaluate Marqo-FashionCLIP and Marqo-FashionSigLIP across seven public benchmark datasets. Read more about the models on our blog.
We averaged the performance of three common tasks across the datasets: text-to-image, category-to-product, and sub-category-to-product. As demonstrated below, Marqo-FashionCLIP and Marqo-FashionSigLIP outperform both pretrained OpenCLIP models and the state-of-the-art fashion CLIP models. For a more comprehensive performance comparison, refer to the LEADERBOARD.
Text-To-Image (Averaged across 6 datasets)
Model | AvgRecall | Recall@1 | Recall@10 | MRR |
---|---|---|---|---|
Marqo-FashionSigLIP | 0.231 | 0.121 | 0.340 | 0.239 |
Marqo-FashionCLIP | 0.192 | 0.094 | 0.290 | 0.200 |
FashionCLIP2.0 | 0.163 | 0.077 | 0.249 | 0.165 |
OpenFashionCLIP | 0.132 | 0.060 | 0.204 | 0.135 |
ViT-B-16-laion2b_s34b_b88k | 0.174 | 0.088 | 0.261 | 0.180 |
ViT-B-16-SigLIP-webli | 0.212 | 0.111 | 0.314 | 0.214 |
Category-To-Product (Averaged across 5 datasets)
Model | AvgP | P@1 | P@10 | MRR |
---|---|---|---|---|
Marqo-FashionSigLIP | 0.737 | 0.758 | 0.716 | 0.812 |
Marqo-FashionCLIP | 0.705 | 0.734 | 0.676 | 0.776 |
FashionCLIP2.0 | 0.684 | 0.681 | 0.686 | 0.741 |
OpenFashionCLIP | 0.646 | 0.653 | 0.639 | 0.720 |
ViT-B-16-laion2b_s34b_b88k | 0.662 | 0.673 | 0.652 | 0.743 |
ViT-B-16-SigLIP-webli | 0.688 | 0.690 | 0.685 | 0.751 |
Sub-Category-To-Product (Averaged across 4 datasets)
Model | AvgP | P@1 | P@10 | MRR |
---|---|---|---|---|
Marqo-FashionSigLIP | 0.725 | 0.767 | 0.683 | 0.811 |
Marqo-FashionCLIP | 0.707 | 0.747 | 0.667 | 0.772 |
FashionCLIP2.0 | 0.657 | 0.676 | 0.638 | 0.733 |
OpenFashionCLIP | 0.598 | 0.619 | 0.578 | 0.689 |
ViT-B-16-laion2b_s34b_b88k | 0.638 | 0.651 | 0.624 | 0.712 |
ViT-B-16-SigLIP-webli | 0.643 | 0.643 | 0.643 | 0.726 |
We released our models on HuggingFace: Marqo-FashionCLIP and Marqo-FashionSigLIP. We also have a Hugging Face Space Demo of our models in action: Classification with Marqo-FashionSigLIP.
You can load the models with transformers
by
from transformers import AutoModel, AutoProcessor
model = AutoModel.from_pretrained('Marqo/marqo-fashionCLIP', trust_remote_code=True)
processor = AutoProcessor.from_pretrained('Marqo/marqo-fashionCLIP', trust_remote_code=True)
and
from transformers import AutoModel, AutoProcessor
model = AutoModel.from_pretrained('Marqo/marqo-fashionSigLIP', trust_remote_code=True)
processor = AutoProcessor.from_pretrained('Marqo/marqo-fashionSigLIP', trust_remote_code=True)
Then,
import torch
from PIL import Image
image = [Image.open("docs/fashion-hippo.png")]
text = ["a hat", "a t-shirt", "shoes"]
processed = processor(text=text, images=image, padding='max_length', return_tensors="pt")
with torch.no_grad():
image_features = model.get_image_features(processed['pixel_values'], normalize=True)
text_features = model.get_text_features(processed['input_ids'], normalize=True)
text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)
print("Label probs:", text_probs)
We released this article illustrating a simple ecommerce search with a fashion dataset if you want to see the model in action.
You can load the models with open_clip
by
import open_clip
model, preprocess_train, preprocess_val = open_clip.create_model_and_transforms('hf-hub:Marqo/marqo-fashionCLIP')
tokenizer = open_clip.get_tokenizer('hf-hub:Marqo/marqo-fashionCLIP')
and
import open_clip
model, preprocess_train, preprocess_val = open_clip.create_model_and_transforms('hf-hub:Marqo/marqo-fashionSigLIP')
tokenizer = open_clip.get_tokenizer('hf-hub:Marqo/marqo-fashionSigLIP')
Then,
import torch
from PIL import Image
image = preprocess_val(Image.open("docs/fashion-hippo.png")).unsqueeze(0)
text = tokenizer(["a hat", "a t-shirt", "shoes"])
with torch.no_grad(), torch.cuda.amp.autocast():
image_features = model.encode_image(image, normalize=True)
text_features = model.encode_text(text, normalize=True)
text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)
print("Label probs:", text_probs)
To deploy on Marqo Cloud (recommended):
-
Sign Up to Marqo Cloud.
-
Install Marqo and the Marqo python client:
pip install marqo
- Create and index:
import marqo
settings = {
"type": "unstructured",
"model": "marqo-fashion-clip", # model name
"modelProperties": {
"name": "ViT-B-16", # model architecture
"dimensions": 512, # embedding dimensions
"url": "https://marqo-gcl-public.s3.us-west-2.amazonaws.com/marqo-fashionCLIP/marqo_fashionCLIP.pt", # model weights
"type": "open_clip" # loading library
},
}
api_key = "your_api_key" # replace with your api key (https://www.marqo.ai/blog/finding-my-marqo-api-key)
mq = marqo.Client("https://api.marqo.ai", api_key=api_key)
mq.create_index("fashion-index", settings_dict=settings)
# triggers model download
mq.index("fashion-index").search("black dress")
See the full documentation for more details on adding documents and searching.
Install PyTorch first and run
pip install -r requirements.txt
To evaluate Marqo-FashionCLIP, run this command
python eval.py \
--dataset-config ./configs/${DATASET}.json \
--model-name Marqo/marqo-fashionCLIP \
--run-name Marqo-FashionCLIP
DATASET
can be one of ['deepfashion_inshop', 'deepfashion_multimodal', 'fashion200k', 'KAGL', 'atlas', 'polyvore' 'iMaterialist']
To evaluate Marqo-FashionSigLIP, run this command
python eval.py \
--dataset-config ./configs/${DATASET}.json \
--model-name Marqo/marqo-fashionSigLIP \
--run-name Marqo-FashionSigLIP
DATASET
can be one of ['deepfashion_inshop', 'deepfashion_multimodal', 'fashion200k', 'KAGL', 'atlas', 'polyvore' 'iMaterialist']
Scripts to evaluate other models including FashionCLIP 2.0 and OpenFashionCLIP can be found in scripts directory.
We collected 7 public multimodal fashion datasets and uploaded to HuggingFace: Atlas, DeepFashion (In-shop), DeepFashion (Multimodal), Fashion200k, iMaterialist, KAGL, and Polyvore. Each dataset has different metadata available. Thus, tasks for each dataset are stored as json files in scripts directory. Refer to our blog for more information about each dataset.
To renew LEADERBOARD.md and summarize results of different models locally, run this command
python summarize_results.py
@software{Jung_Marqo-FashionCLIP_and_Marqo-FashionSigLIP_2024,
author = {Jung, Myong Chol and Clark, Jesse},
month = aug,
title = {{Marqo-FashionCLIP and Marqo-FashionSigLIP}},
url = {https://github.com/marqo-ai/marqo-FashionCLIP},
version = {1.0.0},
year = {2024}
}