model: Add granite GPTQ model #95

willmj · 2024-10-25T15:12:38Z

While trying to train a quantized version of PowerLM GPTQ, I encountered the following error:

INFO - Ignoring unknown parameter in the quantization configuration: is_marlin_format.
INFO - `checkpoint_format` is missing from the quantization configuration and is automatically inferred to gptq
ERROR:sft_trainer.py:Traceback (most recent call last):
File "/home/tuning/.local/lib/python3.11/site-packages/tuning/sft_trainer.py", line 644, in main
trainer, additional_train_info = train(
^^^^^^
File "/home/tuning/.local/lib/python3.11/site-packages/tuning/sft_trainer.py", line 217, in train
model = model_loader(
^^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.11/site-packages/fms_acceleration/framework.py", line 183, in model_loader
return plugin.model_loader(model_name, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.11/site-packages/fms_acceleration_peft/framework_plugin_autogptq.py", line 194, in model_loader
model = GPTQModel.from_quantized(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.11/site-packages/fms_acceleration_peft/gptqmodel/models/auto.py", line 105, in from_quantized
model_type = check_and_get_model_type(model_name_or_path, trust_remote_code)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tuning/.local/lib/python3.11/site-packages/fms_acceleration_peft/gptqmodel/utils/model.py", line 402, in check_and_get_model_type
raise TypeError(f"{config.model_type} isn't supported yet.")
TypeError: granite isn't supported yet.

I look at the layers of the model I was trying to train (located at /fmaas-integration-tests/models/powerlm-3b-r240924a-gptq/:

>>> model.modules
<bound method Module.modules of GraniteForCausalLM(
  (model): GraniteModel(
    (embed_tokens): Embedding(49152, 2304, padding_idx=0)
    (layers): ModuleList(
      (0-39): 40 x GraniteDecoderLayer(
        (self_attn): GraniteSdpaAttention(
          (k_proj): QuantLinear()
          (o_proj): QuantLinear()
          (q_proj): QuantLinear()
          (v_proj): QuantLinear()
        )
        (mlp): GraniteMLP(
          (act_fn): SiLU()
          (down_proj): QuantLinear()
          (gate_proj): QuantLinear()
          (up_proj): QuantLinear()
        )
        (input_layernorm): GraniteRMSNorm((2304,), eps=1e-05)
        (post_attention_layernorm): GraniteRMSNorm((2304,), eps=1e-05)
      )
    )
    (norm): GraniteRMSNorm((2304,), eps=1e-05)
    (rotary_emb): GraniteRotaryEmbedding()
  )
  (lm_head): Linear(in_features=2304, out_features=49152, bias=False)
)>

I added these layers in plugins/accelerated-peft/src/fms_acceleration_peft/gptqmodel/models/granite.py in a sleep pod and tried running tuning again:

Generating train split: 50 examples [00:00, 2148.48 examples/s]
Map: 100%|████████████████████████████████████████████████████████████████████| 50/50 [00:00<00:00, 7209.19 examples/s]
/home/tuning/.local/lib/python3.11/site-packages/transformers/training_args.py:2027: FutureWarning: `--push_to_hub_token` is deprecated and will be removed in version 5 of 🤗 Transformers. Use `--hub_token` instead.
  warnings.warn(
/home/tuning/.local/lib/python3.11/site-packages/huggingface_hub/utils/_deprecation.py:100: FutureWarning: Deprecated argument(s) used in '__init__': dataset_text_field, max_seq_length, dataset_kwargs. Will not be supported from version '1.0.0'.

Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead.
  warnings.warn(message, FutureWarning)
/home/tuning/.local/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:283: UserWarning: You passed a `max_seq_length` argument to the SFTTrainer, the value you passed will override the one in the `SFTConfig`.
  warnings.warn(
/home/tuning/.local/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:321: UserWarning: You passed a `dataset_text_field` argument to the SFTTrainer, the value you passed will override the one in the `SFTConfig`.
  warnings.warn(
/home/tuning/.local/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:327: UserWarning: You passed a `dataset_kwargs` argument to the SFTTrainer, the value you passed will override the one in the `SFTConfig`.
  warnings.warn(
Map: 100%|████████████████████████████████████████████████████████████████████| 50/50 [00:00<00:00, 7870.12 examples/s]
/home/tuning/.local/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:396: UserWarning: You passed a tokenizer with `padding_side` not equal to `right` to the SFTTrainer. This might lead to some unexpected behaviour due to overflow issues when training a model in half-precision. You might consider adding `tokenizer.padding_side = 'right'` to your code.
  warnings.warn(
{'loss': 5.0915, 'grad_norm': 7.277552604675293, 'learning_rate': 9.080000000000001e-06, 'epoch': 1.0}                 
{'loss': 4.0155, 'grad_norm': 13.794990539550781, 'learning_rate': 8.08e-06, 'epoch': 2.0}                             
{'loss': 2.2562, 'grad_norm': 11.645210266113281, 'learning_rate': 7.08e-06, 'epoch': 3.0}                             
{'loss': 0.8507, 'grad_norm': 3.169497489929199, 'learning_rate': 6.08e-06, 'epoch': 4.0}                              
{'loss': 0.3599, 'grad_norm': 3.6142966747283936, 'learning_rate': 5.0800000000000005e-06, 'epoch': 5.0}               
{'loss': 0.2384, 'grad_norm': 10.101705551147461, 'learning_rate': 4.08e-06, 'epoch': 6.0}                             
{'loss': 0.2255, 'grad_norm': 8.865755081176758, 'learning_rate': 3.08e-06, 'epoch': 7.0}                              
{'loss': 0.1753, 'grad_norm': 6.977315425872803, 'learning_rate': 2.08e-06, 'epoch': 8.0}                              
{'loss': 0.1276, 'grad_norm': 9.803406715393066, 'learning_rate': 1.08e-06, 'epoch': 9.0}                              
{'loss': 0.1505, 'grad_norm': 1.8857629299163818, 'learning_rate': 8e-08, 'epoch': 10.0}                               
{'train_runtime': 140.1359, 'train_samples_per_second': 3.568, 'train_steps_per_second': 1.784, 'train_loss': 1.3490877084732056, 'epoch': 10.0}
100%|████████████████████████████████████████████████████████████████████████████████| 250/250 [02:20<00:00,  1.78it/s]

Signed-off-by: Will Johnson <[email protected]>

fabianlim · 2024-10-28T13:20:08Z

We should enable the last bench and update the benchmarks https://github.com/foundation-model-stack/fms-acceleration/blob/main/scripts/benchmarks/scenarios-granite.yaml#L97

Update: I ran a small bench on the internal checkpoint that @willmj provided me. The numbers looked ok, though the throughout was about 300 tokens slower that the previous bench on PowerLM3B

Update: decided not to update the bench as the GPTQ checkpoint is an internal checkpoint as confirmed by @tharapalanivel

raw_summary.csv
requirements.txt
benchmarks.csv

But we cant commit this as a new official bench because the checkpoint is not readily available, unless @willmj you can provide the commands used to generate this gptq checkpoint

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

fabianlim

LGTM, but there is an outstanding question if we should have this as a bench

fabianlim · 2024-10-29T14:42:04Z

Update: also interestingly, for transformers==4.46 there is no effect on the loss for the dense models. Below is a regression of the loss after updating, as compared to bench results. There is no effect.

Update: ok the reason is because the loss function was not refactored for the granite models in transformers==4.46

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

willmj · 2024-10-29T15:40:48Z

GPTQ checkpoint was produced by @tharapalanivel, requested and tracked in this issue. Model was quantized using AutoGPTQ, for more info check out documentation.

fabianlim · 2024-10-31T08:28:49Z

@willmj @tharapalanivel merging this PR. decided not to commit the benches as this is an internal checkpoint. But note that it clocks in slower than the BNB version

feat: Add granite GPTQ model

8b5bade

Signed-off-by: Will Johnson <[email protected]>

willmj requested a review from fabianlim as a code owner October 25, 2024 15:12

fmt + lint

bbf7da3

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

fabianlim approved these changes Oct 29, 2024

View reviewed changes

update granite benches to be in line with foundation-model-stack#92

7195d50

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

fabianlim merged commit e8bc5dd into foundation-model-stack:main Oct 31, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model: Add granite GPTQ model #95

model: Add granite GPTQ model #95

willmj commented Oct 25, 2024 •

edited

Loading

fabianlim commented Oct 28, 2024 •

edited

Loading

fabianlim left a comment •

edited

Loading

fabianlim commented Oct 29, 2024 •

edited

Loading

willmj commented Oct 29, 2024 •

edited

Loading

fabianlim commented Oct 31, 2024

model: Add granite GPTQ model #95

model: Add granite GPTQ model #95

Conversation

willmj commented Oct 25, 2024 • edited Loading

fabianlim commented Oct 28, 2024 • edited Loading

fabianlim left a comment • edited Loading

Choose a reason for hiding this comment

fabianlim commented Oct 29, 2024 • edited Loading

willmj commented Oct 29, 2024 • edited Loading

fabianlim commented Oct 31, 2024

willmj commented Oct 25, 2024 •

edited

Loading

fabianlim commented Oct 28, 2024 •

edited

Loading

fabianlim left a comment •

edited

Loading

fabianlim commented Oct 29, 2024 •

edited

Loading

willmj commented Oct 29, 2024 •

edited

Loading