Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not able to run LLaVA-Next pretraining with NeMo 2.0 using container version nemo:24.12 #11741

Open
bernardhan33 opened this issue Jan 3, 2025 · 3 comments
Labels
bug Something isn't working

Comments

@bernardhan33
Copy link

Describe the bug

I would love to run LLaVA-Next pretraining with NeMo 2.0 following the documentation but failed with various errors with either nemo:24.12, nemo:24.09 or nemo:dev.

Steps/Code to reproduce bug

  1. Pull the latest NeMo container version:
docker pull nvcr.io/nvidia/nemo:24.12
  1. Start the docker container:
docker run --gpus all -it --rm --shm-size=32g -p 8888:8888 -p 6006:6006 --ulimit memlock=-1 --ulimit stack=67108864 nvcr.io/nvidia/nemo:24.12
  1. Within the docker container, create the Python code pretrain.py and fill with the sample code from the documentation:
from nemo.collections import vlm

finetune = vlm.llava_next_7b.pretrain_recipe(
    name="llava_next_7b_pretrain",
    dir=f"/NeMo/new-ckpts",
    num_nodes=1,
    num_gpus_per_node=8,
    language_model_from_pretrained='/NeMo/neva/checkpoints/llama-3-8b-instruct.nemo', # This is the directory where I transformed the Llama3-8b-Instruct checkpoint to .nemo format
    # Can be None or change based on local checkpoint path
)

import nemo_run as run

run.run(finetune, executor=run.LocalExecutor())
  1. Run the code
python3 pretrain.py
  1. Got error
TypeError: pretrain_recipe() got an unexpected keyword argument 'language_model_from_pretrained'
  1. Confirmed from the code path /opt/NeMo/nemo/collections/vlm/recipes/llava_next_7b.py that the code does not support language_model_from_pretrained.
  2. Removed the line that specified language_model_from_pretrained and tried again. Got error
AttributeError: 'MockDataModule' object has no attribute 'micro_batch_size'
  1. Also tried container versions nemo:dev and nemo:24.09. Failed with errors.
AttributeError: module 'nemo.collections.vlm' has no attribute 'llava_next_7b'

Confirmed from code path that the recipes do not exist yet in those versions.

Expected behavior

I should be able to follow the public documentation to get the LLaVA-NEXT pretraining run just fine.

Environment overview (please complete the following information)

  • Environment location: GCP.
  • Method of NeMo install: Docker.
  • If method of install is [Docker], provide docker pull & docker run commands used: see above.

Environment details

N/A.

Additional context

N/A.

@bernardhan33 bernardhan33 added the bug Something isn't working label Jan 3, 2025
@bernardhan33
Copy link
Author

bernardhan33 commented Jan 3, 2025

At step 7 when we got the error

AttributeError: 'MockDataModule' object has no attribute 'micro_batch_size'

could this be a similar issue to this stackoverflow question, where some dependency imports are messed up?

@yashaswikarnati
Copy link
Collaborator

Hello, Sorry for the inconvenience. This particular PR 11424 was missed by our cherrypicking process into release branch. While we are actively working on fixing that, could you try with ToT main. Thank you!

@yashaswikarnati
Copy link
Collaborator

#11783

We would be releasing a new container with the fixes soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants