[FIXED] `Qwen2VL` finetuning broken #1485

Any-Winter-4079 · 2024-12-29T12:27:49Z

Installing latest version of unsloth:

!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git

breaks the Qwen2 7B Vision Colab it seems(?)

from unsloth import FastVisionModel # FastLanguageModel for LLMs
import torch

# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
    "unsloth/Llama-3.2-11B-Vision-Instruct-bnb-4bit", # Llama 3.2 vision support
    "unsloth/Llama-3.2-11B-Vision-bnb-4bit",
    "unsloth/Llama-3.2-90B-Vision-Instruct-bnb-4bit", # Can fit in a 80GB card!
    "unsloth/Llama-3.2-90B-Vision-bnb-4bit",

    "unsloth/Pixtral-12B-2409-bnb-4bit",              # Pixtral fits in 16GB!
    "unsloth/Pixtral-12B-Base-2409-bnb-4bit",         # Pixtral base model

    "unsloth/Qwen2-VL-2B-Instruct-bnb-4bit",          # Qwen2 VL support
    "unsloth/Qwen2-VL-7B-Instruct-bnb-4bit",
    "unsloth/Qwen2-VL-72B-Instruct-bnb-4bit",

    "unsloth/llava-v1.6-mistral-7b-hf-bnb-4bit",      # Any Llava variant works!
    "unsloth/llava-1.5-7b-hf-bnb-4bit",
] # More models at https://huggingface.co/unsloth

model, tokenizer = FastVisionModel.from_pretrained(
    "unsloth/Qwen2-VL-7B-Instruct",
    load_in_4bit = True, # Use 4bit to reduce memory use. False for 16bit LoRA.
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for long context
)

Leading to:

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
[<ipython-input-2-dd19b029c094>](https://localhost:8080/#) in <cell line: 22>()
     20 ] # More models at https://huggingface.co/unsloth
     21 
---> 22 model, tokenizer = FastVisionModel.from_pretrained(
     23     "unsloth/Qwen2-VL-7B-Instruct",
     24     load_in_4bit = True, # Use 4bit to reduce memory use. False for 16bit LoRA.

3 frames
[/usr/local/lib/python3.10/dist-packages/unsloth_zoo/compiler.py](https://localhost:8080/#) in patch_gradient_accumulation(modeling_file, module)
    789     functions = dir(modeling_file)
    790     module = eval(f"modeling_file.{module}")
--> 791     forward = module.forward
    792     source = inspect.getsource(forward)
    793     has_kwargs = tuple(inspect.signature(forward).parameters.values())[-1].kind == inspect._VAR_KEYWORD

AttributeError: type object 'Qwen2VLCausalLMOutputWithPast' has no attribute 'forward'

The text was updated successfully, but these errors were encountered:

Any-Winter-4079 · 2024-12-29T13:29:48Z

Never mind. It seems it works with the latest commit, but for some reason, the first time the cell is run fails, and after re-running it, works. Very weird, but it still works.

Any-Winter-4079 · 2024-12-29T17:18:30Z

I'm reopening the issue because I've tried to run it on Runpod, and I get the same issue:

...
  File "/usr/local/lib/python3.12/dist-packages/unsloth_zoo/compiler.py", line 791, in patch_gradient_accumulation
    forward = module.forward
              ^^^^^^^^^^^^^^
AttributeError: type object 'Qwen2VLCausalLMOutputWithPast' has no attribute 'forward'

with the way to fix it being

try:
    model, tokenizer = FastVisionModel.from_pretrained(
        f"{username}/{model_name}",
        load_in_4bit = False, # Use 4bit to reduce memory use. False for 16bit LoRA.
        use_gradient_checkpointing = "unsloth", # True or "unsloth" for long context
    )
except:
    try:
        model, tokenizer = FastVisionModel.from_pretrained(
            f"{username}/{model_name}",
            load_in_4bit = False, # Use 4bit to reduce memory use. False for 16bit LoRA.
            use_gradient_checkpointing = "unsloth", # True or "unsloth" for long context
    )
    except:
        print('error')

i.e. have the model loading part run twice, which is a bit of a hack (and while it's more okay on Colab, on scripts running on the cloud it's more of a pain)...
So maybe someone has come across this error or knows the issue?
Am I doing something wrong? Is there a small bug somewhere?

anindyamitra2002 · 2024-12-29T17:37:00Z

same issue here

kaykyr · 2024-12-29T18:07:23Z

I'm facing same issue here! It wasn't breaking things like that... Maybe because latest commits/updates

kaykyr · 2024-12-29T18:08:45Z

In my case:

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
Traceback (most recent call last):
  File "/ors/workdir/aura-omini/train.py", line 12, in <module>
    model, tokenizer = FastVisionModel.from_pretrained(
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/anaconda3/envs/unsloth_env2/lib/python3.11/site-packages/unsloth/models/loader.py", line 459, in from_pretrained
    model_types = unsloth_compile_transformers(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/anaconda3/envs/unsloth_env2/lib/python3.11/site-packages/unsloth/models/_utils.py", line 1216, in unsloth_compile_transformers
    _unsloth_compile_transformers(
  File "/root/anaconda3/envs/unsloth_env2/lib/python3.11/site-packages/unsloth_zoo/compiler.py", line 1418, in unsloth_compile_transformers
    new_source = patch_gradient_accumulation(modeling_file, module)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/anaconda3/envs/unsloth_env2/lib/python3.11/site-packages/unsloth_zoo/compiler.py", line 791, in patch_gradient_accumulation
    forward = module.forward
              ^^^^^^^^^^^^^^
AttributeError: type object 'Qwen2VLCausalLMOutputWithPast' has no attribute 'forward'

kaykyr · 2024-12-29T18:09:22Z

My previous enviroment unsloth_env still working...

kaykyr · 2024-12-29T19:00:24Z

I was able to rollback previous commits and find last working version of Unsloth:

python -m pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git@44185c473ba1b20c4f4a7bde2cd8abd7d30e2514" -U --force
python -m pip install "unsloth_zoo @ git+https://github.com/unslothai/unsloth-zoo.git@e9950f5c9895dc2cb1d6e7810713b810e6d94285" -U --force

After installation, go to:
/root/anaconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth/models/vision.py

Remove import "merge_and_overwrite_lora" and save.

Your algorithm will work again.

ariefwijaya · 2024-12-29T21:04:38Z

use pip install "unsloth==2024.12.11" until the problem solved

developer0hye · 2024-12-30T12:19:20Z

The latest version(=2024.12.7) of 'usloth-zoo' has an issue.

pip install "unsloth==2024.12.11"
pip install "unsloth-zoo==2024.12.6"

kaykyr · 2025-01-06T17:34:31Z

The latest version(=2024.12.7) of 'usloth-zoo' has an issue.

pip install "unsloth==2024.12.11" pip install "unsloth-zoo==2024.12.6"

You can train easily, the problem is being able to merge after training.

theflyingdutch789 · 2025-01-09T12:46:21Z

@danielhanchen @shimmyshimmer Can we expect a fix for this soon?

danielhanchen · 2025-01-10T12:52:14Z

It should be fixed hopefully - sorry on the delay and issue!

Please update Unsloth via pip install --upgrade --force-reinstall --no-deps --no-cache-dir unsloth unsloth_zoo

kaykyr · 2025-01-10T18:36:25Z

It should be fixed hopefully - sorry on the delay and issue!

Please update Unsloth via pip install --upgrade --force-reinstall --no-deps --no-cache-dir unsloth unsloth_zoo

Hi @danielhanchen still with errors:

(unsloth_env) root@ors:/ors/workdir/aura-omini# python train_sft.py
🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
[2025-01-10 15:35:06,647] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect)
==((====))==  Unsloth 2025.1.5: Fast Qwen2_Vl vision patching. Transformers: 4.48.0.
   \\   /|    GPU: NVIDIA GeForce RTX 4090. Max memory: 23.643 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1. CUDA: 8.9. CUDA Toolkit: 12.1. Triton: 3.1.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.28.post3. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
`Qwen2VLRotaryEmbedding` can now be fully parameterized by passing the model config through the `config` argument. All other arguments will be removed in v4.46
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████| 7/7 [00:14<00:00,  2.12s/it]
Unsloth: Making `model.base_model.model.visual` require gradients
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 8,552 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 1 | Gradient Accumulation steps = 2
\        /    Total batch size = 2 | Total steps = 12,828
 "-____-"     Number of trainable parameters = 79,298,560
🦥 Unsloth needs about 1-3 minutes to load everything - please wait!
  0%|                                                                                                      | 0/12828 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/ors/workdir/aura-omini/train_sft.py", line 105, in <module>
    trainer.train(resume_from_checkpoint=False)
  File "<string>", line 157, in train
  File "<string>", line 383, in _fast_inner_training_loop
  File "<string>", line 34, in _unsloth_training_step
  File "/root/anaconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth/models/_utils.py", line 1063, in _unsloth_pre_compute_loss
    return self._old_compute_loss(model, inputs, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/anaconda3/envs/unsloth_env/lib/python3.11/site-packages/transformers/trainer.py", line 3734, in compute_loss
    outputs = model(**inputs)
              ^^^^^^^^^^^^^^^
  File "/root/anaconda3/envs/unsloth_env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/anaconda3/envs/unsloth_env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/anaconda3/envs/unsloth_env/lib/python3.11/site-packages/accelerate/utils/operations.py", line 823, in forward
    return model_forward(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/anaconda3/envs/unsloth_env/lib/python3.11/site-packages/accelerate/utils/operations.py", line 811, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/anaconda3/envs/unsloth_env/lib/python3.11/site-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/root/anaconda3/envs/unsloth_env/lib/python3.11/site-packages/peft/peft_model.py", line 1719, in forward
    return self.base_model(
           ^^^^^^^^^^^^^^^^
  File "/root/anaconda3/envs/unsloth_env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/anaconda3/envs/unsloth_env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/anaconda3/envs/unsloth_env/lib/python3.11/site-packages/peft/tuners/tuners_utils.py", line 197, in forward
    return self.model.forward(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/ors/workdir/aura-omini/unsloth_compiled_cache/unsloth_compiled_module_qwen2_vl.py", line 1179, in forward
    return Qwen2VLForConditionalGeneration_forward(self, input_ids, attention_mask, position_ids, past_key_values, inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, return_dict, pixel_values, pixel_values_videos, image_grid_thw, video_grid_thw, rope_deltas, cache_position, **loss_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/ors/workdir/aura-omini/unsloth_compiled_cache/unsloth_compiled_module_qwen2_vl.py", line 878, in Qwen2VLForConditionalGeneration_forward
    image_embeds = self.visual(pixel_values, grid_thw=image_grid_thw)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/anaconda3/envs/unsloth_env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/anaconda3/envs/unsloth_env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1844, in _call_impl
    return inner()
           ^^^^^^^
  File "/root/anaconda3/envs/unsloth_env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1779, in inner
    args_result = hook(self, args)
                  ^^^^^^^^^^^^^^^^
  File "/root/anaconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth_zoo/peft_utils.py", line 201, in requires_grad_pre_hook
    input[0].requires_grad_(True)
RuntimeError: only Tensors of floating point dtype can require gradients
  0%|          | 0/12828 [00:00<?, ?it/s]                                                                                            
(unsloth_env) root@ors:/ors/workdir/aura-omini#

kengboonang · 2025-01-13T06:43:45Z

@danielhanchen @shimmyshimmer are there any temporary fixes at the moment? I am trying to train a Qwen2-VL-7B-Instruct and am facing a similar issue but with this error message instead when calling trainer.train():
currently using torch 2.5.1 and CUDA=12.1

AttributeError Traceback (most recent call last)
Cell In[15], line 1
----> 1 trainer_stats = trainer.train()

File :157, in train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)

File :381, in _fast_inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)

File :31, in _unsloth_training_step(self, model, inputs, num_items_in_batch)

File ~/kengboon/.venv/lib/python3.11/site-packages/unsloth/models/_utils.py:1063, in _unsloth_pre_compute_loss(self, model, inputs, *args, **kwargs)
1057 logger.warning_once(
1058 f"Unsloth: Not an error, but {name} does not accept num_items_in_batch.\n"
1059 "Using gradient accumulation will be very slightly less accurate.\n"
1060 "Read more on gradient accumulation issues here: https://unsloth.ai/blog/gradient"
1061 )
1062 pass
-> 1063 return self._old_compute_loss(model, inputs, *args, **kwargs)

File ~/kengboon/.venv/lib/python3.11/site-packages/transformers/trainer.py:3633, in Trainer.compute_loss(self, model, inputs, return_outputs, num_items_in_batch)
3631 loss_kwargs["num_items_in_batch"] = num_items_in_batch
3632 inputs = {**inputs, **loss_kwargs}
-> 3633 outputs = model(**inputs)
3634 # Save past state if it exists
3635 # TODO: this needs to be fixed and made cleaner later.
3636 if self.args.past_index >= 0:

File ~/kengboon/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1736, in Module._wrapped_call_impl(self, *args, **kwargs)
1734 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1735 else:
-> 1736 return self._call_impl(*args, **kwargs)

File ~/kengboon/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1747, in Module._call_impl(self, *args, **kwargs)
1742 # If we don't have any hooks, we want to skip the rest of the logic in
1743 # this function, and just call forward.
1744 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1745 or _global_backward_pre_hooks or _global_backward_hooks
1746 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1747 return forward_call(*args, **kwargs)
1749 result = None
1750 called_always_called_hooks = set()

File ~/kengboon/.venv/lib/python3.11/site-packages/accelerate/utils/operations.py:820, in convert_outputs_to_fp32..forward(*args, **kwargs)
819 def forward(*args, **kwargs):
--> 820 return model_forward(*args, **kwargs)

File ~/kengboon/.venv/lib/python3.11/site-packages/accelerate/utils/operations.py:808, in ConvertOutputsToFp32.call(self, *args, **kwargs)
807 def call(self, *args, **kwargs):
--> 808 return convert_to_fp32(self.model_forward(*args, **kwargs))

File ~/kengboon/.venv/lib/python3.11/site-packages/torch/amp/autocast_mode.py:44, in autocast_decorator..decorate_autocast(*args, **kwargs)
41 @functools.wraps(func)
42 def decorate_autocast(*args, **kwargs):
43 with autocast_instance:
---> 44 return func(*args, **kwargs)

File ~/kengboon/.venv/lib/python3.11/site-packages/peft/peft_model.py:1719, in PeftModelForCausalLM.forward(self, input_ids, attention_mask, inputs_embeds, labels, output_attentions, output_hidden_states, return_dict, task_ids, **kwargs)
1717 with self._enable_peft_forward_hooks(**kwargs):
1718 kwargs = {k: v for k, v in kwargs.items() if k not in self.special_peft_forward_args}
-> 1719 return self.base_model(
1720 input_ids=input_ids,
1721 attention_mask=attention_mask,
1722 inputs_embeds=inputs_embeds,
1723 labels=labels,
1724 output_attentions=output_attentions,
1725 output_hidden_states=output_hidden_states,
1726 return_dict=return_dict,
1727 **kwargs,
1728 )
1730 batch_size = _get_batch_size(input_ids, inputs_embeds)
1731 if attention_mask is not None:
1732 # concat prompt attention mask

File ~/kengboon/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1736, in Module._wrapped_call_impl(self, *args, **kwargs)
1734 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1735 else:
-> 1736 return self._call_impl(*args, **kwargs)

File ~/kengboon/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1844, in Module._call_impl(self, *args, **kwargs)
1841 return inner()
1843 try:
-> 1844 return inner()
1845 except Exception:
1846 # run always called hooks if they have not already been run
1847 # For now only forward hooks have the always_call option but perhaps
1848 # this functionality should be added to full backward hooks as well.
1849 for hook_id, hook in _global_forward_hooks.items():

File ~/kengboon/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1803, in Module._call_impl..inner()
1801 hook_result = hook(self, args, kwargs, result)
1802 else:
-> 1803 hook_result = hook(self, args, result)
1805 if hook_result is not None:
1806 result = hook_result

File ~/kengboon/.venv/lib/python3.11/site-packages/unsloth_zoo/peft_utils.py:191, in requires_grad_for_gradient_checkpointing..requires_grad_post_hook(module, input, output)
190 def requires_grad_post_hook(module, input, output):
--> 191 output.requires_grad_(True)

AttributeError: 'Qwen2VLCausalLMOutputWithPast' object has no attribute 'requires_grad_'

danielhanchen · 2025-01-14T10:56:46Z

Will re-investigate today - apologies on the delay

thanhhuynhk17 · 2025-01-17T04:53:53Z

just printed out the input and module. Hope this helped

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 68,686 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 30
 "-____-"     Number of trainable parameters = 42,860,544
🦥 Unsloth needs about 1-3 minutes to load everything - please wait!
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
[<ipython-input-19-3d62c575fcfd>](https://localhost:8080/#) in <cell line: 0>()
----> 1 trainer_stats = trainer.train()

22 frames
[/usr/local/lib/python3.11/dist-packages/unsloth_zoo/peft_utils.py](https://localhost:8080/#) in requires_grad_pre_hook(module, input)
    199             if len(input) == 0:
    200                 raise RuntimeError("Unsloth: Failed to make input require gradients!")
--> 201             input[0].requires_grad_(True)
    202         else:
    203             raise RuntimeError("Unsloth: Failed to make input require gradients!")

RuntimeError: only Tensors of floating point dtype can require gradients
> /usr/local/lib/python3.11/dist-packages/unsloth_zoo/peft_utils.py(201)requires_grad_pre_hook()
    199             if len(input) == 0:
    200                 raise RuntimeError("Unsloth: Failed to make input require gradients!")
--> 201             input[0].requires_grad_(True)
    202         else:
    203             raise RuntimeError("Unsloth: Failed to make input require gradients!")

ipdb> input[0]
tensor([[151644,   8948,    198,   2610,    525,    264,  10950,  17847,     13,
         151645,    198, 151644,    872,    198,   7985,    279,  97913,  13042,
            369,    419,   2168,     13, 151652, 151655, 151655, 151655, 151655,
         151655, 151655, 151655, 151655, 151655, 151655, 151653, 151645,    198,
         151644,  77091,    198,     59,  37420,    716,    314,   1124,  15128,
            335,    431,    716,    314,   1124,  15128,    335,    284,   1124,
          37018,    314,    220,     18,    451,    716,    314,    272,    335,
            481,    451,    716,    314,    282,    335,    320,    220,     16,
            481,   1124,  32214,    873,    335,    314,    220,     19,    220,
             23,   1124,   2493,   6306,    314,    220,     17,    335,    335,
            434,    716,    314,   1124,  15128,   1124,   8933,    335,   6306,
            314,    264,    335,   1124,  43615,    295,  34276,    314,    434,
            335,    716,    314,   1124,  15128,   1124,   8933,    335,   6306,
            314,    264,    335,    481,   1124,  37018,    314,    220,     16,
            335,    314,    220,     21,    335,   2804,    425,    716,    314,
           1124,  15128,   1124,   8933,    335,   1124,  43615,    295,  34276,
            314,    425,    335, 151645,    198],
        [151644,   8948,    198,   2610,    525,    264,  10950,  17847,     13,
         151645,    198, 151644,    872,    198,   7985,    279,  97913,  13042,
            369,    419,   2168,     13, 151652, 151655, 151655, 151655, 151655,
         151655, 151655, 151653, 151645,    198, 151644,  77091,    198,     90,
           1124,  37018,    314,   1124,  37420,    444,    335,    314,   1124,
          37420,    259,    335,    335,    284,   1124,   2359,     59,     90,
            444,   1154,   1124,   2359,      7,    444,   6306,    314,    220,
             17,    335,   1124,   1291,      8,    716,    314,   1124,    709,
             80,    220,     16,    335,   1124,   1291,     59,     92,    716,
            314,   1124,     74,  27180,    335, 151645,    198, 151654, 151654,
         151654, 151654, 151654, 151654, 151654, 151654, 151654, 151654, 151654,
         151654, 151654, 151654, 151654, 151654, 151654, 151654, 151654, 151654,
         151654, 151654, 151654, 151654, 151654, 151654, 151654, 151654, 151654,
         151654, 151654, 151654, 151654, 151654, 151654, 151654, 151654, 151654,
         151654, 151654, 151654, 151654, 151654, 151654, 151654, 151654, 151654,
         151654, 151654, 151654, 151654, 151654, 151654, 151654, 151654, 151654,
         151654, 151654, 151654, 151654, 151654]], device='cuda:0')
ipdb> module
lora.Embedding(
  (base_layer): Embedding(152064, 3584, padding_idx=151654)
  (lora_dropout): ModuleDict(
    (default): Identity()
  )
  (lora_A): ModuleDict()
  (lora_B): ModuleDict()
  (lora_embedding_A): ParameterDict(  (default): Parameter containing: [torch.cuda.FloatTensor of size 16x152064 (cuda:0)])
  (lora_embedding_B): ParameterDict(  (default): Parameter containing: [torch.cuda.FloatTensor of size 3584x16 (cuda:0)])
  (lora_magnitude_vector): ModuleDict()
)

danielhanchen · 2025-01-20T11:35:32Z

Hopefully I finally fixed it! Would appreciate it if anyone could check - thanks a lot!

For local machines, please update Unsloth via:

pip install --upgrade --no-cache-dir --force-reinstall --no-deps unsloth unsloth_zoo

Colab / Kaggle should just restart

theflyingdutch789 · 2025-01-20T13:44:00Z

@danielhanchen Hey there! Thanks so much for the update and the fix. After testing, I can confirm that the original issue is resolved—fine-tuning the Qwen family of models and running inference on them now works perfectly.

However, there’s a new snag when trying to do vLLM inference on a Qwen model that was trained in 4-bit, whether loaded/exported in 16-bit or 4-bit. It throws an error that doesn’t appear when doing a vllm inference with Qwen model trained in 16-bit. My guess is that it might have something to do with dynamic quantization, since the 4-bit model is the only one causing this issue in vLLM.

I've included the error below for you to refer to.

INFO 01-20 16:11:05 llm_engine.py:234] Initializing an LLM engine (v0.6.6.post1) with config: model='/home/nabeel/Documents/go-test/temp/vllm_16bit_qwen2_2b_500_r96', speculative_config=None, tokenizer='/home/nabeel/Documents/go-test/temp/vllm_16bit_qwen2_2b_500_r96', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=9000, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=bitsandbytes, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=/home/nabeel/Documents/go-test/temp/vllm_16bit_qwen2_2b_500_r96, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=False, chunked_prefill_enabled=False, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output"],"candidate_compile_sizes":[],"compile_sizes":[],"capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=False,
INFO 01-20 16:11:05 selector.py:120] Using Flash Attention backend.
INFO 01-20 16:11:05 model_runner.py:1094] Starting to load model /home/nabeel/Documents/go-test/temp/vllm_16bit_qwen2_2b_500_r96...
Loading safetensors checkpoint shards: 0% Completed | 0/1 [00:00<?, ?it/s]
2025-01-20 16:11:05,983 - ERROR - Failed to initialize LLM:
[rank0]: Traceback (most recent call last):
[rank0]: File "/home/nabeel/Documents/go-test/temp/vllm_inference.py", line 36, in
[rank0]: raise e
[rank0]: File "/home/nabeel/Documents/go-test/temp/vllm_inference.py", line 27, in
[rank0]: llm = LLM(
[rank0]: File "/home/nabeel/.local/lib/python3.10/site-packages/vllm/utils.py", line 986, in inner
[rank0]: return fn(*args, **kwargs)
[rank0]: File "/home/nabeel/.local/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 230, in init
[rank0]: self.llm_engine = self.engine_class.from_engine_args(
[rank0]: File "/home/nabeel/.local/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 517, in from_engine_args
[rank0]: engine = cls(
[rank0]: File "/home/nabeel/.local/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 273, in init
[rank0]: self.model_executor = executor_class(vllm_config=vllm_config, )
[rank0]: File "/home/nabeel/.local/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 36, in init
[rank0]: self._init_executor()
[rank0]: File "/home/nabeel/.local/lib/python3.10/site-packages/vllm/executor/gpu_executor.py", line 35, in _init_executor
[rank0]: self.driver_worker.load_model()
[rank0]: File "/home/nabeel/.local/lib/python3.10/site-packages/vllm/worker/worker.py", line 155, in load_model
[rank0]: self.model_runner.load_model()
[rank0]: File "/home/nabeel/.local/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1096, in load_model
[rank0]: self.model = get_model(vllm_config=self.vllm_config)
[rank0]: File "/home/nabeel/.local/lib/python3.10/site-packages/vllm/model_executor/model_loader/init.py", line 12, in get_model
[rank0]: return loader.load_model(vllm_config=vllm_config)
[rank0]: File "/home/nabeel/.local/lib/python3.10/site-packages/vllm/model_executor/model_loader/loader.py", line 366, in load_model
[rank0]: loaded_weights = model.load_weights(
[rank0]: File "/home/nabeel/.local/lib/python3.10/site-packages/vllm/model_executor/models/qwen2_vl.py", line 1200, in load_weights
[rank0]: return loader.load_weights(weights, mapper=self.hf_to_vllm_mapper)
[rank0]: File "/home/nabeel/.local/lib/python3.10/site-packages/vllm/model_executor/models/utils.py", line 237, in load_weights
[rank0]: autoloaded_weights = set(self._load_module("", self.module, weights))
[rank0]: File "/home/nabeel/.local/lib/python3.10/site-packages/vllm/model_executor/models/utils.py", line 198, in _load_module
[rank0]: yield from self._load_module(prefix,
[rank0]: File "/home/nabeel/.local/lib/python3.10/site-packages/vllm/model_executor/models/utils.py", line 175, in _load_module
[rank0]: loaded_params = module_load_weights(weights)
[rank0]: File "/home/nabeel/.local/lib/python3.10/site-packages/vllm/model_executor/models/qwen2.py", line 506, in load_weights
[rank0]: return loader.load_weights(weights)
[rank0]: File "/home/nabeel/.local/lib/python3.10/site-packages/vllm/model_executor/models/utils.py", line 237, in load_weights
[rank0]: autoloaded_weights = set(self._load_module("", self.module, weights))
[rank0]: File "/home/nabeel/.local/lib/python3.10/site-packages/vllm/model_executor/models/utils.py", line 198, in _load_module
[rank0]: yield from self._load_module(prefix,
[rank0]: File "/home/nabeel/.local/lib/python3.10/site-packages/vllm/model_executor/models/utils.py", line 175, in _load_module
[rank0]: loaded_params = module_load_weights(weights)
[rank0]: File "/home/nabeel/.local/lib/python3.10/site-packages/vllm/model_executor/models/qwen2.py", line 396, in load_weights
[rank0]: weight_loader(param, loaded_weight)
[rank0]: File "/home/nabeel/.local/lib/python3.10/site-packages/vllm/model_executor/layers/linear.py", line 1087, in weight_loader
[rank0]: assert param_data.shape == loaded_weight.shape
[rank0]: AssertionError
Loading safetensors checkpoint shards: 0% Completed | 0/1 [00:00<?, ?it/s]

[rank0]:[W120 16:11:06.282234738 ProcessGroupNCCL.cpp:1250] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator())

fzyzcjy · 2025-01-23T01:30:13Z

Btw for others seeing this issue: Also try to upgrade transformers to latest version (not only upgrade unsloth).

Any-Winter-4079 changed the title ~~Broken Qwen2 7B Vision SFT Colab: 'Qwen2VLCausalLMOutputWithPast' has no attribute 'forward'~~ Broken Qwen2 7B Vision SFT Colab?: 'Qwen2VLCausalLMOutputWithPast' has no attribute 'forward' Dec 29, 2024

Any-Winter-4079 closed this as completed Dec 29, 2024

Any-Winter-4079 reopened this Dec 29, 2024

danielhanchen added the currently fixing Am fixing now! label Jan 14, 2025

danielhanchen added fixed - pending confirmation Fixed, waiting for confirmation from poster and removed currently fixing Am fixing now! labels Jan 20, 2025

danielhanchen changed the title ~~Broken Qwen2 7B Vision SFT Colab?: 'Qwen2VLCausalLMOutputWithPast' has no attribute 'forward'~~ Qwen2VLCausalLMOutputWithPast' has no attribute 'forward' Jan 20, 2025

danielhanchen changed the title ~~Qwen2VLCausalLMOutputWithPast' has no attribute 'forward'~~ [FIXED] Qwen2VLCausalLMOutputWithPast error Jan 20, 2025

danielhanchen pinned this issue Jan 20, 2025

danielhanchen changed the title ~~[FIXED] Qwen2VLCausalLMOutputWithPast error~~ [FIXED] Qwen2VL finetuning broken Jan 20, 2025

kozzy97 mentioned this issue Jan 23, 2025

Qwen2-VL-72B 4Bit won't load #1510

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FIXED] `Qwen2VL` finetuning broken #1485

[FIXED] `Qwen2VL` finetuning broken #1485

Any-Winter-4079 commented Dec 29, 2024 •

edited

Loading

Any-Winter-4079 commented Dec 29, 2024 •

edited

Loading

Any-Winter-4079 commented Dec 29, 2024 •

edited

Loading

anindyamitra2002 commented Dec 29, 2024

kaykyr commented Dec 29, 2024

kaykyr commented Dec 29, 2024

kaykyr commented Dec 29, 2024

kaykyr commented Dec 29, 2024

ariefwijaya commented Dec 29, 2024 •

edited

Loading

developer0hye commented Dec 30, 2024 •

edited

Loading

kaykyr commented Jan 6, 2025

theflyingdutch789 commented Jan 9, 2025

danielhanchen commented Jan 10, 2025

kaykyr commented Jan 10, 2025

kengboonang commented Jan 13, 2025

danielhanchen commented Jan 14, 2025

thanhhuynhk17 commented Jan 17, 2025

danielhanchen commented Jan 20, 2025

theflyingdutch789 commented Jan 20, 2025

fzyzcjy commented Jan 23, 2025

[FIXED] Qwen2VL finetuning broken #1485

[FIXED] Qwen2VL finetuning broken #1485

Comments

Any-Winter-4079 commented Dec 29, 2024 • edited Loading

Any-Winter-4079 commented Dec 29, 2024 • edited Loading

Any-Winter-4079 commented Dec 29, 2024 • edited Loading

anindyamitra2002 commented Dec 29, 2024

kaykyr commented Dec 29, 2024

kaykyr commented Dec 29, 2024

kaykyr commented Dec 29, 2024

kaykyr commented Dec 29, 2024

ariefwijaya commented Dec 29, 2024 • edited Loading

developer0hye commented Dec 30, 2024 • edited Loading

kaykyr commented Jan 6, 2025

theflyingdutch789 commented Jan 9, 2025

danielhanchen commented Jan 10, 2025

kaykyr commented Jan 10, 2025

kengboonang commented Jan 13, 2025

@danielhanchen @shimmyshimmer are there any temporary fixes at the moment? I am trying to train a Qwen2-VL-7B-Instruct and am facing a similar issue but with this error message instead when calling trainer.train(): currently using torch 2.5.1 and CUDA=12.1

danielhanchen commented Jan 14, 2025

thanhhuynhk17 commented Jan 17, 2025

danielhanchen commented Jan 20, 2025

theflyingdutch789 commented Jan 20, 2025

fzyzcjy commented Jan 23, 2025

[FIXED] `Qwen2VL` finetuning broken #1485

[FIXED] `Qwen2VL` finetuning broken #1485

Any-Winter-4079 commented Dec 29, 2024 •

edited

Loading

Any-Winter-4079 commented Dec 29, 2024 •

edited

Loading

Any-Winter-4079 commented Dec 29, 2024 •

edited

Loading

ariefwijaya commented Dec 29, 2024 •

edited

Loading

developer0hye commented Dec 30, 2024 •

edited

Loading

@danielhanchen @shimmyshimmer are there any temporary fixes at the moment? I am trying to train a Qwen2-VL-7B-Instruct and am facing a similar issue but with this error message instead when calling trainer.train():
currently using torch 2.5.1 and CUDA=12.1