[SFT VLM] Add support for Molmo models #2136

lewtun · 2024-09-27T07:46:23Z

Feature request

Extend the sft_vlm.py script to support the new Molmo models from AllenAI: https://huggingface.co/collections/allenai/molmo-66f379e6fe3b8ef090a8ca19

Paper: https://arxiv.org/abs/2409.17146

Motivation

The Molmo models are super strong VLMs across all model scales, in some cases matching or exceeding the performance of GPT-4V:

Having the ability to tune these models on custom datasets would be quite exciting for many vision-language applications (e.g. agents)

Your contribution

Open to the community!

The text was updated successfully, but these errors were encountered:

sergiopaniego · 2024-09-27T16:11:44Z

I'd like to contribute to it if you give me some guidance about the requirements! 😄

lewtun · 2024-09-27T16:50:35Z

I'd like to contribute to it if you give me some guidance about the requirements! 😄

Great! I would start by looking at the inference code from one of the models (example) and seeing how the inputs need to be provided to the model. Once you've understood that, it should be reasonably straightforward to extend the training script to include these models with trust_remote_code=True

@edbeeching can also provide some guidance as he made the original implementation :)

edbeeching · 2024-09-30T07:02:35Z

Hi @sergiopaniego, I had a look at the modelling code of Molmo and the precessor is not quite the same as llama-vision and llava. So you may find it challenging to have a script that works for all these models.

If you would like to make a standalone script that works just for Molmo, adapted from our sft_vlm script, that would be a great first step, we can then iterate together to see if we can generalize the scripts.

lewtun · 2024-09-30T07:41:56Z

It might also be good to track the transformers integration which will presumably standardise the preprocessing: huggingface/transformers#33710

sergiopaniego · 2024-09-30T16:35:12Z

Thanks a lot for the details!

I'm currently running the script as it is while trying to understand the differences compared to Molmo.
To clarify, @edbeeching the processor that you're talking about is the one in https://huggingface.co/allenai/Molmo-7B-D-0924/blob/main/preprocessing_molmo.py?
I'll try to generate first a standalone script for Molmo as you suggest 😄

edbeeching · 2024-10-03T09:53:30Z

@sergiopaniego

To clarify, @edbeeching the processor that you're talking about is the one in https://huggingface.co/allenai/Molmo-7B-D-0924/blob/main/preprocessing_molmo.py?

Yes that is the one.

sergiopaniego · 2024-10-03T16:55:24Z

Thanks for the reaffirmation, @edbeeching!

I've created a reproducible example on Google Colab to share the code:

Colab Notebook

Currently, I'm encountering a RuntimeError: CUDA error: device-side assert triggered.

Some details:

I've set batch_size=1 because the processor.process function expects only one example.
I’ve made some modifications to the collate_fn to accommodate the processor.
I've also upgraded the transformers library to the latest version.

I’m actively investigating the issue. Do you have any suggestions on how to resolve it?

smellslikeml · 2024-10-03T19:41:04Z

Thanks for the reaffirmation, @edbeeching!

I've created a reproducible example on Google Colab to share the code:

Colab Notebook

Currently, I'm encountering a RuntimeError: CUDA error: device-side assert triggered.

I could load the model without the error by first importing BitsAndBytesConfig from transformers in the 5th cell before adding the config

quantization_config = BitsAndBytesConfig(
        load_in_8bit=False, load_in_4bit=True
        )

sergiopaniego · 2024-10-04T09:03:27Z

Thanks for the reaffirmation, @edbeeching!
I've created a reproducible example on Google Colab to share the code:
Colab Notebook
Currently, I'm encountering a RuntimeError: CUDA error: device-side assert triggered.

I could load the model without the error by first importing BitsAndBytesConfig from transformers in the 5th cell before adding the config
quantization_config = BitsAndBytesConfig(
        load_in_8bit=False, load_in_4bit=True
        )

Could you share your reproducible example?

smellslikeml · 2024-10-04T12:59:05Z

Could you share your reproducible example?

Sure, I've added those changes to your colab here and the rest should be the same.

aleSuglia · 2024-10-07T16:18:32Z

Hello, do you have a timeline for this?

sergiopaniego · 2024-10-09T17:32:40Z

Could you share your reproducible example?

Sure, I've added those changes to your colab here and the rest should be the same.

I attempted to extend the notebook, but I encountered the same exception. I’m continuing to investigate the root cause.

smellslikeml · 2024-10-09T17:54:24Z

Could you share your reproducible example?

Sure, I've added those changes to your colab here and the rest should be the same.

I attempted to extend the notebook, but I encountered the same exception. I’m continuing to investigate the root cause.

try this colab: https://colab.research.google.com/drive/1RICZvuxLJ0g6dCIkOIf0HC5J9fJGqNTU?usp=sharing
it get past the CUDA error and begins training before OOM

edbeeching · 2024-10-13T19:32:47Z

Hi, let me know if you would like me to take a look?

sergiopaniego · 2024-10-14T16:49:35Z

Hi @edbeeching!

Sorry for the delay. I was busy last week, but I have some additional time to dedicate this week. I've reproduced @smellslikeml's idea (https://colab.research.google.com/drive/1doT9u811J-WNCnsT6-rP9-OxnDv52M6W?usp=sharing), and I'll try to generate the PR this week. Should we wait until huggingface/transformers#33962 is completed?

lewtun added 🧒 good second issue Good for contributors with basic project familiarity 👁️ VLM Related to Visual Language Models labels Sep 27, 2024

qgallouedec assigned sergiopaniego Sep 30, 2024

qgallouedec added the 🏋 SFT Related to SFT label Oct 7, 2024

sergiopaniego linked a pull request Oct 15, 2024 that will close this issue

[SFT VLM] Added support for Molmo models via standalone script sft_vlm_molmo #2236

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SFT VLM] Add support for Molmo models #2136

[SFT VLM] Add support for Molmo models #2136

lewtun commented Sep 27, 2024

sergiopaniego commented Sep 27, 2024

lewtun commented Sep 27, 2024

edbeeching commented Sep 30, 2024

lewtun commented Sep 30, 2024

sergiopaniego commented Sep 30, 2024

edbeeching commented Oct 3, 2024

sergiopaniego commented Oct 3, 2024

smellslikeml commented Oct 3, 2024

sergiopaniego commented Oct 4, 2024

smellslikeml commented Oct 4, 2024 •

edited

Loading

aleSuglia commented Oct 7, 2024

sergiopaniego commented Oct 9, 2024

smellslikeml commented Oct 9, 2024

edbeeching commented Oct 13, 2024

sergiopaniego commented Oct 14, 2024

[SFT VLM] Add support for Molmo models #2136

[SFT VLM] Add support for Molmo models #2136

Comments

lewtun commented Sep 27, 2024

Feature request

Motivation

Your contribution

sergiopaniego commented Sep 27, 2024

lewtun commented Sep 27, 2024

edbeeching commented Sep 30, 2024

lewtun commented Sep 30, 2024

sergiopaniego commented Sep 30, 2024

edbeeching commented Oct 3, 2024

sergiopaniego commented Oct 3, 2024

smellslikeml commented Oct 3, 2024

sergiopaniego commented Oct 4, 2024

smellslikeml commented Oct 4, 2024 • edited Loading

aleSuglia commented Oct 7, 2024

sergiopaniego commented Oct 9, 2024

smellslikeml commented Oct 9, 2024

edbeeching commented Oct 13, 2024

sergiopaniego commented Oct 14, 2024

smellslikeml commented Oct 4, 2024 •

edited

Loading