-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FIXED] Qwen2VL
finetuning broken
#1485
Comments
Never mind. It seems it works with the latest commit, but for some reason, the first time the cell is run fails, and after re-running it, works. Very weird, but it still works. |
I'm reopening the issue because I've tried to run it on Runpod, and I get the same issue:
with the way to fix it being
i.e. have the model loading part run twice, which is a bit of a hack (and while it's more okay on Colab, on scripts running on the cloud it's more of a pain)... |
same issue here |
I'm facing same issue here! It wasn't breaking things like that... Maybe because latest commits/updates |
In my case:
|
My previous enviroment |
I was able to rollback previous commits and find last working version of Unsloth:
After installation, go to: Remove import "merge_and_overwrite_lora" and save. Your algorithm will work again. |
use |
The latest version(=
|
You can train easily, the problem is being able to merge after training. |
@danielhanchen @shimmyshimmer Can we expect a fix for this soon? |
It should be fixed hopefully - sorry on the delay and issue! Please update Unsloth via |
Hi @danielhanchen still with errors:
|
@danielhanchen @shimmyshimmer are there any temporary fixes at the moment? I am trying to train a Qwen2-VL-7B-Instruct and am facing a similar issue but with this error message instead when calling trainer.train():
|
Will re-investigate today - apologies on the delay |
just printed out the input and module. Hope this helped
|
Hopefully I finally fixed it! Would appreciate it if anyone could check - thanks a lot! For local machines, please update Unsloth via: pip install --upgrade --no-cache-dir --force-reinstall --no-deps unsloth unsloth_zoo Colab / Kaggle should just restart |
Qwen2VLCausalLMOutputWithPast' has no attribute 'forward'
Qwen2VLCausalLMOutputWithPast' has no attribute 'forward'
Qwen2VLCausalLMOutputWithPast
error
Qwen2VLCausalLMOutputWithPast
errorQwen2VL
finetuning broken
@danielhanchen Hey there! Thanks so much for the update and the fix. After testing, I can confirm that the original issue is resolved—fine-tuning the Qwen family of models and running inference on them now works perfectly. However, there’s a new snag when trying to do vLLM inference on a Qwen model that was trained in 4-bit, whether loaded/exported in 16-bit or 4-bit. It throws an error that doesn’t appear when doing a vllm inference with Qwen model trained in 16-bit. My guess is that it might have something to do with dynamic quantization, since the 4-bit model is the only one causing this issue in vLLM. I've included the error below for you to refer to. INFO 01-20 16:11:05 llm_engine.py:234] Initializing an LLM engine (v0.6.6.post1) with config: model='/home/nabeel/Documents/go-test/temp/vllm_16bit_qwen2_2b_500_r96', speculative_config=None, tokenizer='/home/nabeel/Documents/go-test/temp/vllm_16bit_qwen2_2b_500_r96', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=9000, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=bitsandbytes, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=/home/nabeel/Documents/go-test/temp/vllm_16bit_qwen2_2b_500_r96, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=False, chunked_prefill_enabled=False, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output"],"candidate_compile_sizes":[],"compile_sizes":[],"capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=False, [rank0]:[W120 16:11:06.282234738 ProcessGroupNCCL.cpp:1250] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator()) |
Btw for others seeing this issue: Also try to upgrade |
Installing latest version of unsloth:
!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git
breaks the Qwen2 7B Vision Colab it seems(?)
Leading to:
The text was updated successfully, but these errors were encountered: