Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows四卡3090平台跑baichuan2-13b时,感觉模型好像没有分布到各个显卡上,显存一下就满了oom了。怎么解决? #410

Open
Ruiruiz30 opened this issue Aug 27, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@Ruiruiz30
Copy link

你好徐老师,我使用的是windows四卡3090机器,每张显卡24G显存,因为是win平台,就写了一个bat脚本来运行
`@echo off
set CUDA_VISIBLE_DEVICES=0,1,2,3

call python supervised_finetuning.py ^
--model_type baichuan ^
--model_name_or_path .\pretrainModel ^
--train_file_dir .\data\finetune\dataformedical ^
--validation_file_dir .\data\finetune\dataformedical ^
--per_device_train_batch_size 4 ^
--per_device_eval_batch_size 4 ^
--max_train_samples -1 ^
--max_eval_samples 50000 ^
--do_train ^
--do_eval ^
--template_name baichuan2 ^
--use_peft True ^
--model_max_length 6144 ^
--num_train_epochs 50 ^
--learning_rate 2e-5 ^
--warmup_ratio 0.05 ^
--weight_decay 0.05 ^
--logging_strategy steps ^
--logging_steps 10 ^
--eval_steps 10 ^
--evaluation_strategy steps ^
--save_steps 500 ^
--save_strategy steps ^
--save_total_limit 13 ^
--gradient_accumulation_steps 1 ^
--preprocessing_num_workers 20 ^
--output_dir outputs-sft-MedicalBaichuan-v1 ^
--overwrite_output_dir ^
--ddp_timeout 30000 ^
--logging_first_step True ^
--target_modules all ^
--lora_rank 8 ^
--lora_alpha 16 ^
--lora_dropout 0.05 ^
--torch_dtype float16 ^
--fp16 ^
--device_map auto ^
--report_to tensorboard ^
--ddp_find_unused_parameters False ^
--gradient_checkpointing True ^
--cache_dir .\cache ^
--load_in_8bit True

pause
`
运行时就会直接OOM,错误是
CUDA out of memory. Tried to allocate 4.07 GiB. GPU has a total capacity of 24.00 GiB of which 1.31 GiB is free. Of the allocated memory 16.15 GiB is allocated by PyTorch, and 2.43 GiB is reserved
by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/sta
ble/notes/cuda.html#environment-variables)

请问一下徐老师,这个问题怎么解决。13b的baichuan2进行sft需要多少显存?

@Ruiruiz30 Ruiruiz30 added the bug Something isn't working label Aug 27, 2024
@shibing624
Copy link
Owner

显存不够,改几个参数,先跑起来,再慢慢改大。

--per_device_train_batch_size 2 ^
--per_device_eval_batch_size 2 ^
--max_train_samples -1 ^
--max_eval_samples 50 ^
--do_train ^
--do_eval ^
--template_name baichuan2 ^
--use_peft True ^
--model_max_length 1144 ^
--num_train_epochs 1 ^

@Ruiruiz30
Copy link
Author

显存不够,改几个参数,先跑起来,再慢慢改大。

--per_device_train_batch_size 2 ^ --per_device_eval_batch_size 2 ^ --max_train_samples -1 ^ --max_eval_samples 50 ^ --do_train ^ --do_eval ^ --template_name baichuan2 ^ --use_peft True ^ --model_max_length 1144 ^ --num_train_epochs 1 ^

好的,感谢徐老师。另外我准备使用的数据集是240万条中文医疗数据集,我这个显存大小用这个数据集来做SFT是不是太大了呢?我只用SFT+DPO在baichuan2上训练MedicalGPT,推荐使用多大的数据集呢?

@shibing624
Copy link
Owner

1w条数据就够

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants
@shibing624 @Ruiruiz30 and others