Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local autogptq fix #53

Merged
merged 7 commits into from
Jul 16, 2024

Conversation

achew010
Copy link
Contributor

@achew010 achew010 commented Jul 15, 2024

Description

This PR applies additional fixes to #48,

Fixed items:

  • removed direct dependency in plugin
  • edited plugin README with information to install direct dependencies when creating a legacy GPTQ-LORA
  • Shifted use_external_lib argument from plugin argument to configuration object field
  • Added additional unit tests to ensure
    • the new field is read
    • the correct package is imported
    • appropriate errors are thrown
  • regenerated sample configuration yamls to include the new field

FMS-Acceleration-Peft Unit Test

============================================================================================================================================================================================================ test session starts ============================================================================================================================================================================================================
platform linux -- Python 3.10.12, pytest-8.2.2, pluggy-1.5.0
rootdir: /data/aaron/experimental/fms-acceleration/plugins/accelerated-peft
configfile: pyproject.toml
collected 8 items                                                                                                                                                                                                                                                                                                                                                                                                                           

tests/test_gptqmodel.py ..                                                                                                                                                                                                                                                                                                                                                                                                            [ 25%]
tests/test_peft_plugins.py ...                                                                                                                                                                                                                                                                                                                                                                                                        [ 62%]
tests/test_q4_triton.py ..                                                                                                                                                                                                                                                                                                                                                                                                            [ 87%]
tests/test_triton.py .                                                                                                                                                                                                                                                                                                                                                                                                                [100%]

============================================================================================================================================================================================================= warnings summary ==============================================================================================================================================================================================================
.tox/py/lib/python3.10/site-packages/transformers/utils/hub.py:124
  /data/aaron/experimental/fms-acceleration/plugins/accelerated-peft/.tox/py/lib/python3.10/site-packages/transformers/utils/hub.py:124: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
    warnings.warn(

tests/test_gptqmodel.py::test_pre_quantized_model_outputs_match
tests/test_gptqmodel.py::test_quantizing_pretrained_model_outputs_match
tests/test_gptqmodel.py::test_quantizing_pretrained_model_outputs_match
tests/test_peft_plugins.py::test_autogptq_loading
tests/test_q4_triton.py::TestsQ4Triton::test_generation_desc_act_false
tests/test_q4_triton.py::TestsQ4Triton::test_generation_desc_act_true
tests/test_triton.py::TestTriton::test_triton_qlinear
  /data/aaron/experimental/fms-acceleration/plugins/accelerated-peft/.tox/py/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
    warnings.warn(

tests/test_gptqmodel.py::test_pre_quantized_model_outputs_match
  /data/aaron/experimental/fms-acceleration/plugins/accelerated-peft/.tox/py/lib/python3.10/site-packages/auto_gptq/utils/peft_utils.py:360: UserWarning: You can just ignore this warning if the peft type you use isn't in ['LORA', 'ADALORA'].
  LlamaGPTQForCausalLM supports injecting fused attention but not enables this time. If you are training adapters, you must also disable fused attention injection when loading quantized base model at inference time, otherwise adapters may not be added to base model properly. If you are loading adapters to do inference, you can reference to adapter's config file to check whether the adapters are trained using base model that n
ot enable fused attention injection.
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================================================================================================================================================================================================= 8 passed, 9 warnings in 140.29s (0:02:20) =================================================================================================================================================================================================

FMS-HF-Tuning Acceleration Unit Test

============================================================================================================================================================================================================= warnings summary ==============================================================================================================================================================================================================
../fms-acceleration/.tox/run-benches/lib/python3.10/site-packages/transformers/utils/hub.py:124
  /data/aaron/experimental/fms-acceleration/.tox/run-benches/lib/python3.10/site-packages/transformers/utils/hub.py:124: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
    warnings.warn(

tests/acceleration/test_acceleration_framework.py::test_framework_raises_due_to_invalid_arguments[triton_v2 requires fp16]
tests/acceleration/test_acceleration_framework.py::test_framework_raises_due_to_invalid_arguments[accelerated peft requires peft config]
tests/acceleration/test_acceleration_framework.py::test_framework_intialized_properly_peft[bitsandbytes]
tests/acceleration/test_acceleration_framework.py::test_framework_intialized_properly_peft[bitsandbytes]
tests/acceleration/test_acceleration_framework.py::test_framework_intialized_properly_peft[auto_gptq]
tests/acceleration/test_acceleration_framework.py::test_framework_intialized_properly_foak
  /data/aaron/experimental/fms-acceleration/.tox/run-benches/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
    warnings.warn(

tests/acceleration/test_acceleration_framework.py::test_framework_intialized_properly_peft[bitsandbytes]
tests/acceleration/test_acceleration_framework.py::test_framework_intialized_properly_peft[auto_gptq]
tests/acceleration/test_acceleration_framework.py::test_framework_intialized_properly_foak
  /data/aaron/experimental/fms-acceleration/.tox/run-benches/lib/python3.10/site-packages/transformers/training_args.py:1847: FutureWarning: `--push_to_hub_token` is deprecated and will be removed in version 5 of 🤗 Transformers. Use `--hub_token` instead.
    warnings.warn(

tests/acceleration/test_acceleration_framework.py::test_framework_intialized_properly_peft[bitsandbytes]
tests/acceleration/test_acceleration_framework.py::test_framework_intialized_properly_peft[auto_gptq]
tests/acceleration/test_acceleration_framework.py::test_framework_intialized_properly_foak
  /data/aaron/experimental/fms-acceleration/.tox/run-benches/lib/python3.10/site-packages/huggingface_hub/utils/_deprecation.py:100: FutureWarning: Deprecated argument(s) used in '__init__': dataset_text_field, max_seq_length. Will not be supported from version '1.0.0'.
  
  Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead.
    warnings.warn(message, FutureWarning)

tests/acceleration/test_acceleration_framework.py::test_framework_intialized_properly_peft[bitsandbytes]
tests/acceleration/test_acceleration_framework.py::test_framework_intialized_properly_peft[auto_gptq]
tests/acceleration/test_acceleration_framework.py::test_framework_intialized_properly_foak
  /data/aaron/experimental/fms-acceleration/.tox/run-benches/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:269: UserWarning: You passed a `max_seq_length` argument to the SFTTrainer, the value you passed will override the one in the `SFTConfig`.
    warnings.warn(

tests/acceleration/test_acceleration_framework.py::test_framework_intialized_properly_peft[bitsandbytes]
tests/acceleration/test_acceleration_framework.py::test_framework_intialized_properly_peft[auto_gptq]
tests/acceleration/test_acceleration_framework.py::test_framework_intialized_properly_foak
  /data/aaron/experimental/fms-acceleration/.tox/run-benches/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:307: UserWarning: You passed a `dataset_text_field` argument to the SFTTrainer, the value you passed will override the one in the `SFTConfig`.
    warnings.warn(

tests/acceleration/test_acceleration_framework.py::test_framework_intialized_properly_peft[bitsandbytes]
tests/acceleration/test_acceleration_framework.py::test_framework_intialized_properly_peft[auto_gptq]
tests/acceleration/test_acceleration_framework.py::test_framework_intialized_properly_foak
  /data/aaron/experimental/fms-acceleration/.tox/run-benches/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:397: UserWarning: You passed a tokenizer with `padding_side` not equal to `right` to the SFTTrainer. This might lead to some unexpected behaviour due to overflow issues when training a model in half-precision. You might consider adding `tokenizer.padding_side = 'right'` to your code.
    warnings.warn(

tests/acceleration/test_acceleration_framework.py::test_framework_intialized_properly_peft[bitsandbytes]
tests/acceleration/test_acceleration_framework.py::test_framework_intialized_properly_peft[auto_gptq]
tests/acceleration/test_acceleration_framework.py::test_framework_intialized_properly_foak
  /data/aaron/experimental/fms-acceleration/.tox/run-benches/lib/python3.10/site-packages/accelerate/accelerator.py:447: FutureWarning: Passing the following arguments to `Accelerator` is deprecated and will be removed in version 1.0 of Accelerate: dict_keys(['dispatch_batches', 'split_batches', 'even_batches', 'use_seedable_sampler']). Please pass an `accelerate.DataLoaderConfiguration` instead: 
  dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)
    warnings.warn(

tests/acceleration/test_acceleration_framework.py::test_framework_intialized_properly_foak
  /data/aaron/experimental/fms-hf-tuning/tuning/config/acceleration_configs/acceleration_framework_config.py:244: UserWarning: An experimental acceleration feature is requested by specifying the '--fused_lora' argument. Please note this feature may not support certain edge cases at this juncture. When the feature matures this message will be turned off.
    warnings.warn(

tests/acceleration/test_acceleration_framework.py::test_framework_intialized_properly_foak
  /data/aaron/experimental/fms-hf-tuning/tuning/config/acceleration_configs/acceleration_framework_config.py:244: UserWarning: An experimental acceleration feature is requested by specifying the '--fast_kernels' argument. Please note this feature may not support certain edge cases at this juncture. When the feature matures this message will be turned off.
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
========================================================================================================================================================================================================== short test summary info ==========================================================================================================================================================================================================
SKIPPED [1] tests/acceleration/test_acceleration_framework.py:164:  NOTE: this scenario will actually never happen, since in the code we always
    provide at least one dataclass (can consider to remove this test).
================================================================================================================================================================================================ 13 passed, 1 skipped, 27 warnings in 47.64s ================================================================================================================================================================================================

Copy link
Contributor

@fabianlim fabianlim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks in the right direction, but see comments

@achew010 achew010 marked this pull request as ready for review July 16, 2024 05:48
@fabianlim fabianlim merged commit f4cf311 into foundation-model-stack:main Jul 16, 2024
4 checks passed
@achew010 achew010 deleted the local-autogptq-fix branch July 26, 2024 04:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants