Quantized Peft Benchmark Experiments Run Out of Memory with Non-Zero Lora Dropout #50

achew010 · 2024-07-12T02:59:57Z

Description

Update: Previously it was reported that the OOM was only for BNB, but now it is observed for Quantized Peft in general even for GPTQ. See #106

Outliers

Previous description below describing issue only for BNB

BNB experiments run out of memory in new benchmarks that set lora_dropout=0.1.

Benchmark	framework_config	peft_method	model_name_or_path	num_gpus	per_device_train_batch_size	lora dropout	Peak Memory in Bytes
Reference	accelerated-peft-bnb	lora	NousResearch/Llama-2-70b-hf	2	4	0.	72.39
New	accelerated-peft-bnb	lora	NousResearch/Llama-2-70b-hf	2	4	0.1	0.

Compared to AutoGPTQ, we don't notice this issue

Benchmark	framework_config	peft_method	model_name_or_path	num_gpus	per_device_train_batch_size	lora dropout	Peak Memory in Bytes
Reference	accelerated-peft-autogptq	lora	NousResearch/Llama-2-70b-hf	2	4	0.	70.14
New	accelerated-peft-autogptq	lora	NousResearch/Llama-2-70b-hf	2	4	0.1	71.7

There might be a slight overhead in the dropout implementation that causes the experiment to run out of memory for large models

Reproduce Issue

Lora Dropout=0. enters training

export CUDA_VISIBLE_DEVICES=0,1
export ACCELERATION_FRAMEWORK_CONFIG_FILE=/workspace/fms-acceleration/scripts/benchmarks/../../sample-configurations/baseline-peft-bnb-nf4-sample-configuration.yaml
accelerate launch --config_file scripts/benchmarks/accelerate.yaml --num_processes=2 --main_process_port=29500 -m tuning.sft_trainer --model_name_or_path NousResearch/Llama-2-70b-hf --packing True --max_seq_len 4096 --fp16 True --learning_rate 2e-4 --torch_dtype float16 --peft_method lora --r 16 --lora_alpha 16 --lora_dropout 0. --target_modules q_proj k_proj v_proj o_proj --use_flash_attn True --response_template '
### Response:' --dataset_text_field output --include_tokens_per_second True --num_train_epochs 1 --gradient_accumulation_steps 1 --gradient_checkpointing True --evaluation_strategy no --save_strategy no --weight_decay 0.01 --warmup_steps 10 --adam_epsilon 1e-4 --lr_scheduler_type linear --logging_strategy steps --logging_steps 10 --max_steps 100 --training_data_path benchmark_outputs/data/cache.json --per_device_train_batch_size 4 --output_dir benchmark_outputs/exp_35/hf --skip_memory_metrics False

Lora Dropout=0.1 runs out of memory

export CUDA_VISIBLE_DEVICES=0,1
export ACCELERATION_FRAMEWORK_CONFIG_FILE=/workspace/fms-acceleration/scripts/benchmarks/../../sample-configurations/baseline-peft-bnb-nf4-sample-configuration.yaml
accelerate launch --config_file scripts/benchmarks/accelerate.yaml --num_processes=2 --main_process_port=29500 -m tuning.sft_trainer --model_name_or_path NousResearch/Llama-2-70b-hf --packing True --max_seq_len 4096 --fp16 True --learning_rate 2e-4 --torch_dtype float16 --peft_method lora --r 16 --lora_alpha 16 --lora_dropout 0.1 --target_modules q_proj k_proj v_proj o_proj --use_flash_attn True --response_template '
### Response:' --dataset_text_field output --include_tokens_per_second True --num_train_epochs 1 --gradient_accumulation_steps 1 --gradient_checkpointing True --evaluation_strategy no --save_strategy no --weight_decay 0.01 --warmup_steps 10 --adam_epsilon 1e-4 --lr_scheduler_type linear --logging_strategy steps --logging_steps 10 --max_steps 100 --training_data_path benchmark_outputs/data/cache.json --per_device_train_batch_size 4 --output_dir benchmark_outputs/exp_35/hf --skip_memory_metrics False

The text was updated successfully, but these errors were encountered:

fabianlim · 2024-11-13T03:12:05Z

While this issue was originally reported for BNB, we have now seen it also for Quantized Peft in general in #106 . Updating the issue to reflect the general case.

achew010 mentioned this issue Jul 12, 2024

Extracted Subset of AutoGPTQ library into Accelerated-Peft Plugin #48

Merged

fabianlim added the question Further information is requested label Nov 4, 2024

fabianlim mentioned this issue Nov 13, 2024

Disable MLP Fused Ops if Not SwiGLU, Depracted Fast Quantized Peft Plugin, Update Benchmarks #106

Merged

fabianlim changed the title ~~BNB Benchmark Experiments Run Out of Memory with Non-Zero Lora Dropout~~ Quantized Peft Benchmark Experiments Run Out of Memory with Non-Zero Lora Dropout Nov 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantized Peft Benchmark Experiments Run Out of Memory with Non-Zero Lora Dropout #50

Quantized Peft Benchmark Experiments Run Out of Memory with Non-Zero Lora Dropout #50

achew010 commented Jul 12, 2024 •

edited by fabianlim

Loading

fabianlim commented Nov 13, 2024

Quantized Peft Benchmark Experiments Run Out of Memory with Non-Zero Lora Dropout #50

Quantized Peft Benchmark Experiments Run Out of Memory with Non-Zero Lora Dropout #50

Comments

achew010 commented Jul 12, 2024 • edited by fabianlim Loading

Description

Reproduce Issue

Lora Dropout=0. enters training

Lora Dropout=0.1 runs out of memory

fabianlim commented Nov 13, 2024

achew010 commented Jul 12, 2024 •

edited by fabianlim

Loading