Slow Training Speed #21

s13kman · 2021-11-02T13:57:05Z

Hi,
First of all great work! I really loved it. To replicate, I tried training on the Conceptual 12M Dataset with the depth and dims same as the pretrained models but the training was too slow. Even in 4 days it was going through the first (or 0th) epoch. I'm training it on NVIDIA Quadro RTX A6000 which I don't think is that much slow.
Any suggestions to improve the speed of training? I have multi-gpu access but seems it isn't supported rn.
Thanks !

mehdidc · 2021-11-05T09:49:56Z

Hi @s13kman, thanks for your interest! I would suggest to use multi-gpu training to speed up training since you have access to multiple GPUs. Actually multi-gpu is supported through Horovod (https://github.com/horovod/horovod).
Once you install Horovod, basically you don't need to change much, something like:

horovodrun -np number_of_gpus python main.py your_config_file.yaml

Given that the dataset is relatively big, I actually train the models usually only on a single epoch.

CrossLee1 · 2021-11-22T11:42:17Z

How long did it take you to train only a single epoch?

mehdidc · 2022-07-09T08:51:07Z

Hi @CrossLee1 sorry for replying until now, so it takes around 6 hours, but I train them on 64 A100 GPUs (data parallel with Horovod) to speed up the process. I am quite sure there are a lot things to optimize here in terms of hardware usage, I was mostly going for fast experiments (walltime) to figure out what works the best (in terms of architecture, data augmentation, losses, etc.) rather than optimizing the training speed.

mehdidc mentioned this issue Jul 9, 2022

training GPU configuration #23

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow Training Speed #21

Slow Training Speed #21

s13kman commented Nov 2, 2021

mehdidc commented Nov 5, 2021

CrossLee1 commented Nov 22, 2021 •

edited

Loading

mehdidc commented Jul 9, 2022 •

edited

Loading

Slow Training Speed #21

Slow Training Speed #21

Comments

s13kman commented Nov 2, 2021

mehdidc commented Nov 5, 2021

CrossLee1 commented Nov 22, 2021 • edited Loading

mehdidc commented Jul 9, 2022 • edited Loading

CrossLee1 commented Nov 22, 2021 •

edited

Loading

mehdidc commented Jul 9, 2022 •

edited

Loading