-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix FSDP when performing GPTQ-LoRA with Triton V2 #15
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Yu Chin Fabian Lim <[email protected]>
Signed-off-by: Yu Chin Fabian Lim <[email protected]>
achew010
approved these changes
May 21, 2024
fabianlim
added a commit
that referenced
this pull request
May 27, 2024
…or GPTQ-LoRA (#20) * Add GitHub Workflow for Linting , Formatting and Test. Activate Workflow for Framework (#7) * add lint workflow Signed-off-by: Yu Chin Fabian Lim <[email protected]> * add pylintrc, update .tox fix files Signed-off-by: Yu Chin Fabian Lim <[email protected]> * activate test and minor fix Signed-off-by: Yu Chin Fabian Lim <[email protected]> * lint benchmarks.py and add workflow to dev Signed-off-by: Yu Chin Fabian Lim <[email protected]> --------- Signed-off-by: Yu Chin Fabian Lim <[email protected]> * Improvements to Benchmark Scripts and Config Generation Workflow (#13) * fix benches and add verify configs Signed-off-by: Yu Chin Fabian Lim <[email protected]> * update readme and add workflow Signed-off-by: Yu Chin Fabian Lim <[email protected]> * add packaging dep Signed-off-by: Yu Chin Fabian Lim <[email protected]> * update torch dep in framework and run-benches Signed-off-by: Yu Chin Fabian Lim <[email protected]> * take host env in run-benches * add display bench results script * rename summary.csv to raw_summary.csv and update run_benchmarks.sh * export environment variables in shell command * dump out pip requirements for repro, and add default FHT_branch --------- Signed-off-by: Yu Chin Fabian Lim <[email protected]> * Added support for running official HF baseline FSDP-QLoRA benchmark (#16) * new baseline scenario * rename variables * added warning when plugin allows SFTTrainer to handle PEFT on single device * Fix FSDP when performing GPTQ-LoRA with Triton V2 (#15) * wrap in parameters and torch view to correct dtype Signed-off-by: Yu Chin Fabian Lim <[email protected]> * refactor to apply patch only on FSDP and simplify Signed-off-by: Yu Chin Fabian Lim <[email protected]> --------- Signed-off-by: Yu Chin Fabian Lim <[email protected]> * Provide Memory Benchmarking Feature to Benchmarking Code (#14) * add gpu memory logging support * made improvements to GPU reference and result collation * Renamed memory logging argument to reflect its readings as reserved me mory using nvidia-smi and changed aggregation function in result collation * variable renames * manual linting * added memory logging functionality via HFTrainer * added support to benchmark memory using HFTrainer and updated READMEwith explanation of the 2 memory benchmarking options * addressed changes requested in PR #14 * fix bug and smplify gpu logs aggregation logic * fixes to calculation of HFTrainer Mem Logging values * fix calculations * more fixes * fix to ignore including stage inside max calculation of alloc memory * more comments and README updates * added fix to keyerror due to empty output dict from OOM * manual linting * added benchmark results to refs * remove unnecessary columns in results gathering * made changes to results gathering --------- Signed-off-by: Yu Chin Fabian Lim <[email protected]> Co-authored-by: achew010 <[email protected]>
so because of the casting we are facing this error in #25 now
|
This was referenced May 29, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
GPTQ-LoRa depends on the AutoGPTQ package, but there are issues that prevent the base GPTQ model from being FSDPed
The issue comes from that
QuantLinear
class stores the parameters (i.e.qweight, qzeros
) intorch.int32
, which results intorch.nn.Parameter
, and these aretorch.Tensor
The fix is to then use
torch.tensor.view
, which does a C++ style reinterpret cast inQuantLinear.forward
before call in the theQuantLinearFunction
autograd function. We create thenn.Parameter
in the same vein, by doing aqweight.view(torch_type)
to force the parameter to be a oftorch_type
(which is going to be a float type)Reproduce
To reproduce the fix, consider the command
Losses and Throughputs AFTER FIX
TODO:
g_idx
andscales
as parameters so they can be sharded. Update: the code is quite flexibile now and easy to add more parameterslow_cpu_mem_usage
properly. the model is currently unncessarily loading the full model into GPU memory beforeprepare
, which should be avoided.QuantLinear
like marlin, etc (may not do).