Model Merge reduces performance #1519

mosama1994 · 2025-01-07T20:19:37Z

I continuous pretrained a Qwen2.5 0.5B model. When I load the adapter and merge it and then perform inference, the output looks good. But, when I merge with adapter and save model in 16bit or 4bit using the unsloth saving strategy, and then load the saved model for inference the output is not good, the performance degrades. What is the reason for that?

TechieUser2517 · 2025-01-08T10:32:47Z

I've observed this too while doing continuous pretraining tests of various different models around 7-8B size. The merged models obtained with:

model.save_pretrained_merged(modelname, tokenizer, save_method = "merged_16bit",)

have very poor performance and also tokenizer issues with Llama.cpp when attempting to convert them to GGUF format. You can replace the tokenizer files with those of the base model to make it work, but I suspect that's where the problems originate.

On the other hand, adapter checkpoints separately merged using different methods appear to work as expected.

danielhanchen · 2025-01-10T12:28:30Z

I'm currently working on making merging a much better experience - unfortunately most issues result from chat template issues, but there does seem to be some issues with merging with 4bit models - I'll update you guys!

TechieUser2517 · 2025-01-10T12:30:48Z

As far as I am aware of, adapter merging should very preferably be done with the original (16-bit) weights and not the 4-bit quantized models, so if that's what's occurring in that step, that might be the reason for the performance loss.

mosama1994 · 2025-01-10T13:47:47Z

Merging with bf16 precision also causes the issue. When you merge with bf16 precision model and save the issue is still there. I used the merged_16bit and also tried the peft library to merge and save. If you merge and then do inference, that is fine. But, if you save merged model and do inference from saved model it is not good.

Also, a different question @danielhanchen what tokenizer patching are we doing in unsloth? Would it be possible for you to list down in a few points what is happening there?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model Merge reduces performance #1519

Model Merge reduces performance #1519

mosama1994 commented Jan 7, 2025

TechieUser2517 commented Jan 8, 2025 •

edited

Loading

danielhanchen commented Jan 10, 2025

TechieUser2517 commented Jan 10, 2025 •

edited

Loading

mosama1994 commented Jan 10, 2025

Model Merge reduces performance #1519

Model Merge reduces performance #1519

Comments

mosama1994 commented Jan 7, 2025

TechieUser2517 commented Jan 8, 2025 • edited Loading

danielhanchen commented Jan 10, 2025

TechieUser2517 commented Jan 10, 2025 • edited Loading

mosama1994 commented Jan 10, 2025

TechieUser2517 commented Jan 8, 2025 •

edited

Loading

TechieUser2517 commented Jan 10, 2025 •

edited

Loading