Local autogptq fix #53

achew010 · 2024-07-15T10:21:32Z

Description

This PR applies additional fixes to #48,

Fixed items:

removed direct dependency in plugin
edited plugin README with information to install direct dependencies when creating a legacy GPTQ-LORA
Shifted use_external_lib argument from plugin argument to configuration object field
Added additional unit tests to ensure
- the new field is read
- the correct package is imported
- appropriate errors are thrown
regenerated sample configuration yamls to include the new field

FMS-Acceleration-Peft Unit Test

============================================================================================================================================================================================================ test session starts ============================================================================================================================================================================================================
platform linux -- Python 3.10.12, pytest-8.2.2, pluggy-1.5.0
rootdir: /data/aaron/experimental/fms-acceleration/plugins/accelerated-peft
configfile: pyproject.toml
collected 8 items                                                                                                                                                                                                                                                                                                                                                                                                                           

tests/test_gptqmodel.py ..                                                                                                                                                                                                                                                                                                                                                                                                            [ 25%]
tests/test_peft_plugins.py ...                                                                                                                                                                                                                                                                                                                                                                                                        [ 62%]
tests/test_q4_triton.py ..                                                                                                                                                                                                                                                                                                                                                                                                            [ 87%]
tests/test_triton.py .                                                                                                                                                                                                                                                                                                                                                                                                                [100%]

============================================================================================================================================================================================================= warnings summary ==============================================================================================================================================================================================================
.tox/py/lib/python3.10/site-packages/transformers/utils/hub.py:124
  /data/aaron/experimental/fms-acceleration/plugins/accelerated-peft/.tox/py/lib/python3.10/site-packages/transformers/utils/hub.py:124: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
    warnings.warn(

tests/test_gptqmodel.py::test_pre_quantized_model_outputs_match
tests/test_gptqmodel.py::test_quantizing_pretrained_model_outputs_match
tests/test_gptqmodel.py::test_quantizing_pretrained_model_outputs_match
tests/test_peft_plugins.py::test_autogptq_loading
tests/test_q4_triton.py::TestsQ4Triton::test_generation_desc_act_false
tests/test_q4_triton.py::TestsQ4Triton::test_generation_desc_act_true
tests/test_triton.py::TestTriton::test_triton_qlinear
  /data/aaron/experimental/fms-acceleration/plugins/accelerated-peft/.tox/py/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
    warnings.warn(

tests/test_gptqmodel.py::test_pre_quantized_model_outputs_match
  /data/aaron/experimental/fms-acceleration/plugins/accelerated-peft/.tox/py/lib/python3.10/site-packages/auto_gptq/utils/peft_utils.py:360: UserWarning: You can just ignore this warning if the peft type you use isn't in ['LORA', 'ADALORA'].
  LlamaGPTQForCausalLM supports injecting fused attention but not enables this time. If you are training adapters, you must also disable fused attention injection when loading quantized base model at inference time, otherwise adapters may not be added to base model properly. If you are loading adapters to do inference, you can reference to adapter's config file to check whether the adapters are trained using base model that n
ot enable fused attention injection.
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================================================================================================================================================================================================= 8 passed, 9 warnings in 140.29s (0:02:20) =================================================================================================================================================================================================

FMS-HF-Tuning Acceleration Unit Test

============================================================================================================================================================================================================= warnings summary ==============================================================================================================================================================================================================
../fms-acceleration/.tox/run-benches/lib/python3.10/site-packages/transformers/utils/hub.py:124
  /data/aaron/experimental/fms-acceleration/.tox/run-benches/lib/python3.10/site-packages/transformers/utils/hub.py:124: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
    warnings.warn(

tests/acceleration/test_acceleration_framework.py::test_framework_raises_due_to_invalid_arguments[triton_v2 requires fp16]
tests/acceleration/test_acceleration_framework.py::test_framework_raises_due_to_invalid_arguments[accelerated peft requires peft config]
tests/acceleration/test_acceleration_framework.py::test_framework_intialized_properly_peft[bitsandbytes]
tests/acceleration/test_acceleration_framework.py::test_framework_intialized_properly_peft[bitsandbytes]
tests/acceleration/test_acceleration_framework.py::test_framework_intialized_properly_peft[auto_gptq]
tests/acceleration/test_acceleration_framework.py::test_framework_intialized_properly_foak
  /data/aaron/experimental/fms-acceleration/.tox/run-benches/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
    warnings.warn(

tests/acceleration/test_acceleration_framework.py::test_framework_intialized_properly_peft[bitsandbytes]
tests/acceleration/test_acceleration_framework.py::test_framework_intialized_properly_peft[auto_gptq]
tests/acceleration/test_acceleration_framework.py::test_framework_intialized_properly_foak
  /data/aaron/experimental/fms-acceleration/.tox/run-benches/lib/python3.10/site-packages/transformers/training_args.py:1847: FutureWarning: `--push_to_hub_token` is deprecated and will be removed in version 5 of 🤗 Transformers. Use `--hub_token` instead.
    warnings.warn(

tests/acceleration/test_acceleration_framework.py::test_framework_intialized_properly_peft[bitsandbytes]
tests/acceleration/test_acceleration_framework.py::test_framework_intialized_properly_peft[auto_gptq]
tests/acceleration/test_acceleration_framework.py::test_framework_intialized_properly_foak
  /data/aaron/experimental/fms-acceleration/.tox/run-benches/lib/python3.10/site-packages/huggingface_hub/utils/_deprecation.py:100: FutureWarning: Deprecated argument(s) used in '__init__': dataset_text_field, max_seq_length. Will not be supported from version '1.0.0'.
  
  Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead.
    warnings.warn(message, FutureWarning)

tests/acceleration/test_acceleration_framework.py::test_framework_intialized_properly_peft[bitsandbytes]
tests/acceleration/test_acceleration_framework.py::test_framework_intialized_properly_peft[auto_gptq]
tests/acceleration/test_acceleration_framework.py::test_framework_intialized_properly_foak
  /data/aaron/experimental/fms-acceleration/.tox/run-benches/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:269: UserWarning: You passed a `max_seq_length` argument to the SFTTrainer, the value you passed will override the one in the `SFTConfig`.
    warnings.warn(

tests/acceleration/test_acceleration_framework.py::test_framework_intialized_properly_peft[bitsandbytes]
tests/acceleration/test_acceleration_framework.py::test_framework_intialized_properly_peft[auto_gptq]
tests/acceleration/test_acceleration_framework.py::test_framework_intialized_properly_foak
  /data/aaron/experimental/fms-acceleration/.tox/run-benches/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:307: UserWarning: You passed a `dataset_text_field` argument to the SFTTrainer, the value you passed will override the one in the `SFTConfig`.
    warnings.warn(

tests/acceleration/test_acceleration_framework.py::test_framework_intialized_properly_peft[bitsandbytes]
tests/acceleration/test_acceleration_framework.py::test_framework_intialized_properly_peft[auto_gptq]
tests/acceleration/test_acceleration_framework.py::test_framework_intialized_properly_foak
  /data/aaron/experimental/fms-acceleration/.tox/run-benches/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:397: UserWarning: You passed a tokenizer with `padding_side` not equal to `right` to the SFTTrainer. This might lead to some unexpected behaviour due to overflow issues when training a model in half-precision. You might consider adding `tokenizer.padding_side = 'right'` to your code.
    warnings.warn(

tests/acceleration/test_acceleration_framework.py::test_framework_intialized_properly_peft[bitsandbytes]
tests/acceleration/test_acceleration_framework.py::test_framework_intialized_properly_peft[auto_gptq]
tests/acceleration/test_acceleration_framework.py::test_framework_intialized_properly_foak
  /data/aaron/experimental/fms-acceleration/.tox/run-benches/lib/python3.10/site-packages/accelerate/accelerator.py:447: FutureWarning: Passing the following arguments to `Accelerator` is deprecated and will be removed in version 1.0 of Accelerate: dict_keys(['dispatch_batches', 'split_batches', 'even_batches', 'use_seedable_sampler']). Please pass an `accelerate.DataLoaderConfiguration` instead: 
  dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)
    warnings.warn(

tests/acceleration/test_acceleration_framework.py::test_framework_intialized_properly_foak
  /data/aaron/experimental/fms-hf-tuning/tuning/config/acceleration_configs/acceleration_framework_config.py:244: UserWarning: An experimental acceleration feature is requested by specifying the '--fused_lora' argument. Please note this feature may not support certain edge cases at this juncture. When the feature matures this message will be turned off.
    warnings.warn(

tests/acceleration/test_acceleration_framework.py::test_framework_intialized_properly_foak
  /data/aaron/experimental/fms-hf-tuning/tuning/config/acceleration_configs/acceleration_framework_config.py:244: UserWarning: An experimental acceleration feature is requested by specifying the '--fast_kernels' argument. Please note this feature may not support certain edge cases at this juncture. When the feature matures this message will be turned off.
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
========================================================================================================================================================================================================== short test summary info ==========================================================================================================================================================================================================
SKIPPED [1] tests/acceleration/test_acceleration_framework.py:164:  NOTE: this scenario will actually never happen, since in the code we always
    provide at least one dataclass (can consider to remove this test).
================================================================================================================================================================================================ 13 passed, 1 skipped, 27 warnings in 47.64s ================================================================================================================================================================================================

plugins/accelerated-peft/README.md

plugins/accelerated-peft/tests/test_peft_plugins.py

plugins/accelerated-peft/src/fms_acceleration_peft/framework_plugin_autogptq.py

plugins/accelerated-peft/tests/test_peft_plugins.py

fabianlim

Looks in the right direction, but see comments

plugins/accelerated-peft/src/fms_acceleration_peft/framework_plugin_autogptq.py

plugins/accelerated-peft/tests/test_peft_plugins.py

…lugin_autogptq.py Co-authored-by: Yu Chin Fabian Lim <[email protected]>

achew010 added 2 commits July 15, 2024 08:48

additional fixes to local implementation of accelerated-peft

41cd932

additional changes to tests

10e8e10