Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upstream Main: Linting, Benchmarking, HF QLoRA baseline, FSDP fixes for GPTQ-LoRA #20

Merged
merged 5 commits into from
May 27, 2024

Conversation

fabianlim
Copy link
Contributor

@fabianlim fabianlim commented May 27, 2024

Upstreaming to main, see commit notes.

fabianlim and others added 5 commits May 17, 2024 10:33
…low for Framework (#7)

* add lint workflow

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

* add pylintrc, update .tox fix files

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

* activate test and minor fix

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

* lint benchmarks.py and add workflow to dev

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

---------

Signed-off-by: Yu Chin Fabian Lim <[email protected]>
* fix benches and add verify configs

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

* update readme and add workflow

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

* add packaging dep

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

* update torch dep in framework and run-benches

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

* take host env in run-benches

* add display bench results script

* rename summary.csv to raw_summary.csv and update run_benchmarks.sh

* export environment variables in shell command

* dump out pip requirements for repro, and add default FHT_branch

---------

Signed-off-by: Yu Chin Fabian Lim <[email protected]>
)

* new baseline scenario

* rename variables

* added warning when plugin allows SFTTrainer to handle PEFT on single device
* wrap in parameters and torch view to correct dtype

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

* refactor to apply patch only on FSDP and simplify

Signed-off-by: Yu Chin Fabian Lim <[email protected]>

---------

Signed-off-by: Yu Chin Fabian Lim <[email protected]>
* add gpu memory logging support

* made improvements to GPU reference and result collation

* Renamed memory logging argument to reflect its readings as reserved me
mory using nvidia-smi and changed aggregation function in result collation

* variable renames

* manual linting

* added memory logging functionality via HFTrainer

* added support to benchmark memory using HFTrainer and updated READMEwith explanation of the 2 memory benchmarking options

* addressed changes requested in PR #14

* fix bug and smplify gpu logs aggregation logic

* fixes to calculation of HFTrainer Mem Logging values

* fix calculations

* more fixes

* fix to ignore including  stage inside max calculation of alloc memory

* more comments and README updates

* added fix to keyerror due to empty output dict from OOM

* manual linting

* added benchmark results to refs

* remove unnecessary columns in results gathering

* made changes to results gathering
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants