Windows四卡3090平台跑baichuan2-13b时，感觉模型好像没有分布到各个显卡上，显存一下就满了oom了。怎么解决？ #410

Ruiruiz30 · 2024-08-27T05:57:10Z

你好徐老师，我使用的是windows四卡3090机器，每张显卡24G显存，因为是win平台，就写了一个bat脚本来运行
`@echo off
set CUDA_VISIBLE_DEVICES=0,1,2,3

call python supervised_finetuning.py ^
--model_type baichuan ^
--model_name_or_path .\pretrainModel ^
--train_file_dir .\data\finetune\dataformedical ^
--validation_file_dir .\data\finetune\dataformedical ^
--per_device_train_batch_size 4 ^
--per_device_eval_batch_size 4 ^
--max_train_samples -1 ^
--max_eval_samples 50000 ^
--do_train ^
--do_eval ^
--template_name baichuan2 ^
--use_peft True ^
--model_max_length 6144 ^
--num_train_epochs 50 ^
--learning_rate 2e-5 ^
--warmup_ratio 0.05 ^
--weight_decay 0.05 ^
--logging_strategy steps ^
--logging_steps 10 ^
--eval_steps 10 ^
--evaluation_strategy steps ^
--save_steps 500 ^
--save_strategy steps ^
--save_total_limit 13 ^
--gradient_accumulation_steps 1 ^
--preprocessing_num_workers 20 ^
--output_dir outputs-sft-MedicalBaichuan-v1 ^
--overwrite_output_dir ^
--ddp_timeout 30000 ^
--logging_first_step True ^
--target_modules all ^
--lora_rank 8 ^
--lora_alpha 16 ^
--lora_dropout 0.05 ^
--torch_dtype float16 ^
--fp16 ^
--device_map auto ^
--report_to tensorboard ^
--ddp_find_unused_parameters False ^
--gradient_checkpointing True ^
--cache_dir .\cache ^
--load_in_8bit True

pause
`
运行时就会直接OOM，错误是
CUDA out of memory. Tried to allocate 4.07 GiB. GPU has a total capacity of 24.00 GiB of which 1.31 GiB is free. Of the allocated memory 16.15 GiB is allocated by PyTorch, and 2.43 GiB is reserved
by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/sta
ble/notes/cuda.html#environment-variables)

请问一下徐老师，这个问题怎么解决。13b的baichuan2进行sft需要多少显存？

shibing624 · 2024-08-27T09:37:53Z

显存不够，改几个参数，先跑起来，再慢慢改大。

--per_device_train_batch_size 2 ^
--per_device_eval_batch_size 2 ^
--max_train_samples -1 ^
--max_eval_samples 50 ^
--do_train ^
--do_eval ^
--template_name baichuan2 ^
--use_peft True ^
--model_max_length 1144 ^
--num_train_epochs 1 ^

Ruiruiz30 · 2024-08-27T12:57:39Z

显存不够，改几个参数，先跑起来，再慢慢改大。

--per_device_train_batch_size 2 ^ --per_device_eval_batch_size 2 ^ --max_train_samples -1 ^ --max_eval_samples 50 ^ --do_train ^ --do_eval ^ --template_name baichuan2 ^ --use_peft True ^ --model_max_length 1144 ^ --num_train_epochs 1 ^

好的，感谢徐老师。另外我准备使用的数据集是240万条中文医疗数据集，我这个显存大小用这个数据集来做SFT是不是太大了呢？我只用SFT+DPO在baichuan2上训练MedicalGPT，推荐使用多大的数据集呢？

shibing624 · 2024-08-27T14:08:38Z

1w条数据就够

Ruiruiz30 added the bug Something isn't working label Aug 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Windows四卡3090平台跑baichuan2-13b时，感觉模型好像没有分布到各个显卡上，显存一下就满了oom了。怎么解决？ #410

Windows四卡3090平台跑baichuan2-13b时，感觉模型好像没有分布到各个显卡上，显存一下就满了oom了。怎么解决？ #410

Ruiruiz30 commented Aug 27, 2024

shibing624 commented Aug 27, 2024

Ruiruiz30 commented Aug 27, 2024

shibing624 commented Aug 27, 2024

Windows四卡3090平台跑baichuan2-13b时，感觉模型好像没有分布到各个显卡上，显存一下就满了oom了。怎么解决？ #410

Windows四卡3090平台跑baichuan2-13b时，感觉模型好像没有分布到各个显卡上，显存一下就满了oom了。怎么解决？ #410

Comments

Ruiruiz30 commented Aug 27, 2024

shibing624 commented Aug 27, 2024

Ruiruiz30 commented Aug 27, 2024

shibing624 commented Aug 27, 2024