Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SFT VLM] Add support for Molmo models #2136

Open
lewtun opened this issue Sep 27, 2024 · 15 comments · May be fixed by #2236
Open

[SFT VLM] Add support for Molmo models #2136

lewtun opened this issue Sep 27, 2024 · 15 comments · May be fixed by #2236
Assignees
Labels
🧒 good second issue Good for contributors with basic project familiarity 🏋 SFT Related to SFT 👁️ VLM Related to Visual Language Models

Comments

@lewtun
Copy link
Member

lewtun commented Sep 27, 2024

Feature request

Extend the sft_vlm.py script to support the new Molmo models from AllenAI: https://huggingface.co/collections/allenai/molmo-66f379e6fe3b8ef090a8ca19

Paper: https://arxiv.org/abs/2409.17146

Motivation

The Molmo models are super strong VLMs across all model scales, in some cases matching or exceeding the performance of GPT-4V:

Screenshot 2024-09-27 at 09 43 26

Having the ability to tune these models on custom datasets would be quite exciting for many vision-language applications (e.g. agents)

Your contribution

Open to the community!

@lewtun lewtun added 🧒 good second issue Good for contributors with basic project familiarity 👁️ VLM Related to Visual Language Models labels Sep 27, 2024
@sergiopaniego
Copy link
Contributor

I'd like to contribute to it if you give me some guidance about the requirements! 😄

@lewtun
Copy link
Member Author

lewtun commented Sep 27, 2024

I'd like to contribute to it if you give me some guidance about the requirements! 😄

Great! I would start by looking at the inference code from one of the models (example) and seeing how the inputs need to be provided to the model. Once you've understood that, it should be reasonably straightforward to extend the training script to include these models with trust_remote_code=True

@edbeeching can also provide some guidance as he made the original implementation :)

@edbeeching
Copy link
Collaborator

Hi @sergiopaniego, I had a look at the modelling code of Molmo and the precessor is not quite the same as llama-vision and llava. So you may find it challenging to have a script that works for all these models.

If you would like to make a standalone script that works just for Molmo, adapted from our sft_vlm script, that would be a great first step, we can then iterate together to see if we can generalize the scripts.

@lewtun
Copy link
Member Author

lewtun commented Sep 30, 2024

It might also be good to track the transformers integration which will presumably standardise the preprocessing: huggingface/transformers#33710

@sergiopaniego
Copy link
Contributor

Thanks a lot for the details!

I'm currently running the script as it is while trying to understand the differences compared to Molmo.
To clarify, @edbeeching the processor that you're talking about is the one in https://huggingface.co/allenai/Molmo-7B-D-0924/blob/main/preprocessing_molmo.py?
I'll try to generate first a standalone script for Molmo as you suggest 😄

@edbeeching
Copy link
Collaborator

@sergiopaniego

To clarify, @edbeeching the processor that you're talking about is the one in https://huggingface.co/allenai/Molmo-7B-D-0924/blob/main/preprocessing_molmo.py?

Yes that is the one.

@sergiopaniego
Copy link
Contributor

Thanks for the reaffirmation, @edbeeching!

I've created a reproducible example on Google Colab to share the code:

Colab Notebook

Currently, I'm encountering a RuntimeError: CUDA error: device-side assert triggered.

Some details:

  • I've set batch_size=1 because the processor.process function expects only one example.
  • I’ve made some modifications to the collate_fn to accommodate the processor.
  • I've also upgraded the transformers library to the latest version.

I’m actively investigating the issue. Do you have any suggestions on how to resolve it?

@smellslikeml
Copy link

Thanks for the reaffirmation, @edbeeching!

I've created a reproducible example on Google Colab to share the code:

Colab Notebook

Currently, I'm encountering a RuntimeError: CUDA error: device-side assert triggered.

I could load the model without the error by first importing BitsAndBytesConfig from transformers in the 5th cell before adding the config

quantization_config = BitsAndBytesConfig(
        load_in_8bit=False, load_in_4bit=True
        )

@sergiopaniego
Copy link
Contributor

Thanks for the reaffirmation, @edbeeching!
I've created a reproducible example on Google Colab to share the code:
Colab Notebook
Currently, I'm encountering a RuntimeError: CUDA error: device-side assert triggered.

I could load the model without the error by first importing BitsAndBytesConfig from transformers in the 5th cell before adding the config

quantization_config = BitsAndBytesConfig(
        load_in_8bit=False, load_in_4bit=True
        )

Could you share your reproducible example?

@smellslikeml
Copy link

smellslikeml commented Oct 4, 2024

Could you share your reproducible example?

Sure, I've added those changes to your colab here and the rest should be the same.

@qgallouedec qgallouedec added the 🏋 SFT Related to SFT label Oct 7, 2024
@aleSuglia
Copy link

Hello, do you have a timeline for this?

@sergiopaniego
Copy link
Contributor

Could you share your reproducible example?

Sure, I've added those changes to your colab here and the rest should be the same.

I attempted to extend the notebook, but I encountered the same exception. I’m continuing to investigate the root cause.

@smellslikeml
Copy link

Could you share your reproducible example?

Sure, I've added those changes to your colab here and the rest should be the same.

I attempted to extend the notebook, but I encountered the same exception. I’m continuing to investigate the root cause.

try this colab: https://colab.research.google.com/drive/1RICZvuxLJ0g6dCIkOIf0HC5J9fJGqNTU?usp=sharing
it get past the CUDA error and begins training before OOM

@edbeeching
Copy link
Collaborator

Hi, let me know if you would like me to take a look?

@sergiopaniego
Copy link
Contributor

Hi @edbeeching!

Sorry for the delay. I was busy last week, but I have some additional time to dedicate this week. I've reproduced @smellslikeml's idea (https://colab.research.google.com/drive/1doT9u811J-WNCnsT6-rP9-OxnDv52M6W?usp=sharing), and I'll try to generate the PR this week. Should we wait until huggingface/transformers#33962 is completed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🧒 good second issue Good for contributors with basic project familiarity 🏋 SFT Related to SFT 👁️ VLM Related to Visual Language Models
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants