Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quantized Peft Benchmark Experiments Run Out of Memory with Non-Zero Lora Dropout #50

Open
achew010 opened this issue Jul 12, 2024 · 1 comment
Labels
question Further information is requested

Comments

@achew010
Copy link
Contributor

achew010 commented Jul 12, 2024

Description

Update: Previously it was reported that the OOM was only for BNB, but now it is observed for Quantized Peft in general even for GPTQ. See #106

Outliers
image

Previous description below describing issue only for BNB

BNB experiments run out of memory in new benchmarks that set lora_dropout=0.1.

Benchmark framework_config peft_method model_name_or_path num_gpus per_device_train_batch_size lora dropout Peak Memory in Bytes
Reference accelerated-peft-bnb lora NousResearch/Llama-2-70b-hf 2 4 0. 72.39
New accelerated-peft-bnb lora NousResearch/Llama-2-70b-hf 2 4 0.1 0.

Compared to AutoGPTQ, we don't notice this issue

Benchmark framework_config peft_method model_name_or_path num_gpus per_device_train_batch_size lora dropout Peak Memory in Bytes
Reference accelerated-peft-autogptq lora NousResearch/Llama-2-70b-hf 2 4 0. 70.14
New accelerated-peft-autogptq lora NousResearch/Llama-2-70b-hf 2 4 0.1 71.7

There might be a slight overhead in the dropout implementation that causes the experiment to run out of memory for large models

Reproduce Issue

Lora Dropout=0. enters training

export CUDA_VISIBLE_DEVICES=0,1
export ACCELERATION_FRAMEWORK_CONFIG_FILE=/workspace/fms-acceleration/scripts/benchmarks/../../sample-configurations/baseline-peft-bnb-nf4-sample-configuration.yaml
accelerate launch --config_file scripts/benchmarks/accelerate.yaml --num_processes=2 --main_process_port=29500 -m tuning.sft_trainer --model_name_or_path NousResearch/Llama-2-70b-hf --packing True --max_seq_len 4096 --fp16 True --learning_rate 2e-4 --torch_dtype float16 --peft_method lora --r 16 --lora_alpha 16 --lora_dropout 0. --target_modules q_proj k_proj v_proj o_proj --use_flash_attn True --response_template '
### Response:' --dataset_text_field output --include_tokens_per_second True --num_train_epochs 1 --gradient_accumulation_steps 1 --gradient_checkpointing True --evaluation_strategy no --save_strategy no --weight_decay 0.01 --warmup_steps 10 --adam_epsilon 1e-4 --lr_scheduler_type linear --logging_strategy steps --logging_steps 10 --max_steps 100 --training_data_path benchmark_outputs/data/cache.json --per_device_train_batch_size 4 --output_dir benchmark_outputs/exp_35/hf --skip_memory_metrics False

Lora Dropout=0.1 runs out of memory

export CUDA_VISIBLE_DEVICES=0,1
export ACCELERATION_FRAMEWORK_CONFIG_FILE=/workspace/fms-acceleration/scripts/benchmarks/../../sample-configurations/baseline-peft-bnb-nf4-sample-configuration.yaml
accelerate launch --config_file scripts/benchmarks/accelerate.yaml --num_processes=2 --main_process_port=29500 -m tuning.sft_trainer --model_name_or_path NousResearch/Llama-2-70b-hf --packing True --max_seq_len 4096 --fp16 True --learning_rate 2e-4 --torch_dtype float16 --peft_method lora --r 16 --lora_alpha 16 --lora_dropout 0.1 --target_modules q_proj k_proj v_proj o_proj --use_flash_attn True --response_template '
### Response:' --dataset_text_field output --include_tokens_per_second True --num_train_epochs 1 --gradient_accumulation_steps 1 --gradient_checkpointing True --evaluation_strategy no --save_strategy no --weight_decay 0.01 --warmup_steps 10 --adam_epsilon 1e-4 --lr_scheduler_type linear --logging_strategy steps --logging_steps 10 --max_steps 100 --training_data_path benchmark_outputs/data/cache.json --per_device_train_batch_size 4 --output_dir benchmark_outputs/exp_35/hf --skip_memory_metrics False
@fabianlim fabianlim added the question Further information is requested label Nov 4, 2024
@fabianlim fabianlim changed the title BNB Benchmark Experiments Run Out of Memory with Non-Zero Lora Dropout Quantized Peft Benchmark Experiments Run Out of Memory with Non-Zero Lora Dropout Nov 13, 2024
@fabianlim
Copy link
Contributor

While this issue was originally reported for BNB, we have now seen it also for Quantized Peft in general in #106 . Updating the issue to reflect the general case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants