Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

model: Add granite GPTQ model #95

Merged
merged 3 commits into from
Oct 31, 2024

Conversation

willmj
Copy link
Collaborator

@willmj willmj commented Oct 25, 2024

While trying to train a quantized version of PowerLM GPTQ, I encountered the following error:

INFO - Ignoring unknown parameter in the quantization configuration: is_marlin_format.
INFO - `checkpoint_format` is missing from the quantization configuration and is automatically inferred to gptq
ERROR:sft_trainer.py:Traceback (most recent call last):
File "/home/tuning/.local/lib/python3.11/site-packages/tuning/sft_trainer.py", line 644, in main
trainer, additional_train_info = train(
^^^^^^
File "/home/tuning/.local/lib/python3.11/site-packages/tuning/sft_trainer.py", line 217, in train
model = model_loader(
^^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.11/site-packages/fms_acceleration/framework.py", line 183, in model_loader
return plugin.model_loader(model_name, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.11/site-packages/fms_acceleration_peft/framework_plugin_autogptq.py", line 194, in model_loader
model = GPTQModel.from_quantized(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.11/site-packages/fms_acceleration_peft/gptqmodel/models/auto.py", line 105, in from_quantized
model_type = check_and_get_model_type(model_name_or_path, trust_remote_code)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.11/site-packages/fms_acceleration_peft/gptqmodel/utils/model.py", line 402, in check_and_get_model_type
raise TypeError(f"{config.model_type} isn't supported yet.")
TypeError: granite isn't supported yet.

I look at the layers of the model I was trying to train (located at /fmaas-integration-tests/models/powerlm-3b-r240924a-gptq/:

>>> model.modules
<bound method Module.modules of GraniteForCausalLM(
  (model): GraniteModel(
    (embed_tokens): Embedding(49152, 2304, padding_idx=0)
    (layers): ModuleList(
      (0-39): 40 x GraniteDecoderLayer(
        (self_attn): GraniteSdpaAttention(
          (k_proj): QuantLinear()
          (o_proj): QuantLinear()
          (q_proj): QuantLinear()
          (v_proj): QuantLinear()
        )
        (mlp): GraniteMLP(
          (act_fn): SiLU()
          (down_proj): QuantLinear()
          (gate_proj): QuantLinear()
          (up_proj): QuantLinear()
        )
        (input_layernorm): GraniteRMSNorm((2304,), eps=1e-05)
        (post_attention_layernorm): GraniteRMSNorm((2304,), eps=1e-05)
      )
    )
    (norm): GraniteRMSNorm((2304,), eps=1e-05)
    (rotary_emb): GraniteRotaryEmbedding()
  )
  (lm_head): Linear(in_features=2304, out_features=49152, bias=False)
)>

I added these layers in plugins/accelerated-peft/src/fms_acceleration_peft/gptqmodel/models/granite.py in a sleep pod and tried running tuning again:

Generating train split: 50 examples [00:00, 2148.48 examples/s]
Map: 100%|████████████████████████████████████████████████████████████████████| 50/50 [00:00<00:00, 7209.19 examples/s]
/home/tuning/.local/lib/python3.11/site-packages/transformers/training_args.py:2027: FutureWarning: `--push_to_hub_token` is deprecated and will be removed in version 5 of 🤗 Transformers. Use `--hub_token` instead.
  warnings.warn(
/home/tuning/.local/lib/python3.11/site-packages/huggingface_hub/utils/_deprecation.py:100: FutureWarning: Deprecated argument(s) used in '__init__': dataset_text_field, max_seq_length, dataset_kwargs. Will not be supported from version '1.0.0'.

Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead.
  warnings.warn(message, FutureWarning)
/home/tuning/.local/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:283: UserWarning: You passed a `max_seq_length` argument to the SFTTrainer, the value you passed will override the one in the `SFTConfig`.
  warnings.warn(
/home/tuning/.local/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:321: UserWarning: You passed a `dataset_text_field` argument to the SFTTrainer, the value you passed will override the one in the `SFTConfig`.
  warnings.warn(
/home/tuning/.local/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:327: UserWarning: You passed a `dataset_kwargs` argument to the SFTTrainer, the value you passed will override the one in the `SFTConfig`.
  warnings.warn(
Map: 100%|████████████████████████████████████████████████████████████████████| 50/50 [00:00<00:00, 7870.12 examples/s]
/home/tuning/.local/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:396: UserWarning: You passed a tokenizer with `padding_side` not equal to `right` to the SFTTrainer. This might lead to some unexpected behaviour due to overflow issues when training a model in half-precision. You might consider adding `tokenizer.padding_side = 'right'` to your code.
  warnings.warn(
{'loss': 5.0915, 'grad_norm': 7.277552604675293, 'learning_rate': 9.080000000000001e-06, 'epoch': 1.0}                 
{'loss': 4.0155, 'grad_norm': 13.794990539550781, 'learning_rate': 8.08e-06, 'epoch': 2.0}                             
{'loss': 2.2562, 'grad_norm': 11.645210266113281, 'learning_rate': 7.08e-06, 'epoch': 3.0}                             
{'loss': 0.8507, 'grad_norm': 3.169497489929199, 'learning_rate': 6.08e-06, 'epoch': 4.0}                              
{'loss': 0.3599, 'grad_norm': 3.6142966747283936, 'learning_rate': 5.0800000000000005e-06, 'epoch': 5.0}               
{'loss': 0.2384, 'grad_norm': 10.101705551147461, 'learning_rate': 4.08e-06, 'epoch': 6.0}                             
{'loss': 0.2255, 'grad_norm': 8.865755081176758, 'learning_rate': 3.08e-06, 'epoch': 7.0}                              
{'loss': 0.1753, 'grad_norm': 6.977315425872803, 'learning_rate': 2.08e-06, 'epoch': 8.0}                              
{'loss': 0.1276, 'grad_norm': 9.803406715393066, 'learning_rate': 1.08e-06, 'epoch': 9.0}                              
{'loss': 0.1505, 'grad_norm': 1.8857629299163818, 'learning_rate': 8e-08, 'epoch': 10.0}                               
{'train_runtime': 140.1359, 'train_samples_per_second': 3.568, 'train_steps_per_second': 1.784, 'train_loss': 1.3490877084732056, 'epoch': 10.0}
100%|████████████████████████████████████████████████████████████████████████████████| 250/250 [02:20<00:00,  1.78it/s]

Signed-off-by: Will Johnson <[email protected]>
@fabianlim
Copy link
Contributor

fabianlim commented Oct 28, 2024

We should enable the last bench and update the benchmarks https://github.com/foundation-model-stack/fms-acceleration/blob/main/scripts/benchmarks/scenarios-granite.yaml#L97

Update: I ran a small bench on the internal checkpoint that @willmj provided me. The numbers looked ok, though the throughout was about 300 tokens slower that the previous bench on PowerLM3B

Update: decided not to update the bench as the GPTQ checkpoint is an internal checkpoint as confirmed by @tharapalanivel

image
image
image

raw_summary.csv
requirements.txt
benchmarks.csv

But we cant commit this as a new official bench because the checkpoint is not readily available, unless @willmj you can provide the commands used to generate this gptq checkpoint

Signed-off-by: Yu Chin Fabian Lim <[email protected]>
Copy link
Contributor

@fabianlim fabianlim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but there is an outstanding question if we should have this as a bench

@fabianlim
Copy link
Contributor

fabianlim commented Oct 29, 2024

Update: also interestingly, for transformers==4.46 there is no effect on the loss for the dense models. Below is a regression of the loss after updating, as compared to bench results. There is no effect.

Update: ok the reason is because the loss function was not refactored for the granite models in transformers==4.46

image

@willmj
Copy link
Collaborator Author

willmj commented Oct 29, 2024

GPTQ checkpoint was produced by @tharapalanivel, requested and tracked in this issue. Model was quantized using AutoGPTQ, for more info check out documentation.

@fabianlim
Copy link
Contributor

@willmj @tharapalanivel merging this PR. decided not to commit the benches as this is an internal checkpoint. But note that it clocks in slower than the BNB version

@fabianlim fabianlim merged commit e8bc5dd into foundation-model-stack:main Oct 31, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants