Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for running official HF baseline FSDP-QLoRA benchmark #16

Merged
merged 3 commits into from
May 21, 2024

Conversation

achew010
Copy link
Contributor

This PR addresses issue #10 by adding support for a FSDP-compatible HF QLoRA baseline to the our benchmarks.

Feature

This will allow users to specify a no_peft_model field in the plugin config bnb.yaml. Specifying this field will bypass the plugin.augmentation function and allow SFTTrainer to manage the PEFT preparation of the model instead.

NOTE:

  • While the open-source approach to FSDP-compatible QLoRA removes the extraneous dtype casting in prepare_model_for_kbit_training, it only does so when the model is sharded. When on single device, it continues to use prepare_model_for_kbit_training and users will continue to experience a slowdown due to the extraneous casting.

@achew010 achew010 requested a review from fabianlim as a code owner May 21, 2024 03:40
Copy link
Contributor

@fabianlim fabianlim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM save for that linting issue. but im ok to merge this first and then handle #7 later.

require_packages_check=False,
):
# check flags and callbacks
assert (not correct_value)==framework.requires_agumentation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i can see some liniting issues, but we can take care of it in #9

@fabianlim fabianlim merged commit d510ceb into foundation-model-stack:dev May 21, 2024
2 checks passed
@achew010 achew010 deleted the fsdp-qlora-baseline branch May 26, 2024 16:00
fabianlim added a commit that referenced this pull request May 27, 2024
…or GPTQ-LoRA (#20)

* Add GitHub Workflow for Linting , Formatting and Test. Activate Workflow for Framework (#7)

* add lint workflow

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

* add pylintrc, update .tox fix files

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

* activate test and minor fix

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

* lint benchmarks.py and add workflow to dev

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

---------

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

* Improvements to Benchmark Scripts and Config Generation Workflow (#13)

* fix benches and add verify configs

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

* update readme and add workflow

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

* add packaging dep

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

* update torch dep in framework and run-benches

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

* take host env in run-benches

* add display bench results script

* rename summary.csv to raw_summary.csv and update run_benchmarks.sh

* export environment variables in shell command

* dump out pip requirements for repro, and add default FHT_branch

---------

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

* Added support for running official HF baseline FSDP-QLoRA benchmark (#16)

* new baseline scenario

* rename variables

* added warning when plugin allows SFTTrainer to handle PEFT on single device

* Fix FSDP when performing GPTQ-LoRA with Triton V2  (#15)

* wrap in parameters and torch view to correct dtype

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

* refactor to apply patch only on FSDP and simplify

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

---------

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

* Provide Memory Benchmarking Feature to Benchmarking Code (#14)

* add gpu memory logging support

* made improvements to GPU reference and result collation

* Renamed memory logging argument to reflect its readings as reserved me
mory using nvidia-smi and changed aggregation function in result collation

* variable renames

* manual linting

* added memory logging functionality via HFTrainer

* added support to benchmark memory using HFTrainer and updated READMEwith explanation of the 2 memory benchmarking options

* addressed changes requested in PR #14

* fix bug and smplify gpu logs aggregation logic

* fixes to calculation of HFTrainer Mem Logging values

* fix calculations

* more fixes

* fix to ignore including  stage inside max calculation of alloc memory

* more comments and README updates

* added fix to keyerror due to empty output dict from OOM

* manual linting

* added benchmark results to refs

* remove unnecessary columns in results gathering

* made changes to results gathering

---------

Signed-off-by: Yu Chin Fabian Lim <[email protected]>
Co-authored-by: achew010 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants