Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusion about accelerator.num_processes in get_scheduler #9633

Open
hj13-mtlab opened this issue Oct 10, 2024 · 3 comments
Open

Confusion about accelerator.num_processes in get_scheduler #9633

hj13-mtlab opened this issue Oct 10, 2024 · 3 comments

Comments

@hj13-mtlab
Copy link

In the example code from train_text_to_image_sdxl.py:

num_warmup_steps = args.lr_warmup_steps * args.gradient_accumulation_steps

But in train_text_to_image.py:

num_warmup_steps_for_scheduler = args.lr_warmup_steps * accelerator.num_processes

Why is there such a difference in these two cases?

@a-r-r-o-w
Copy link
Member

Pinging @sayakpaul for training scripts. I don't think args.lr_warmup_steps * args.gradient_accumulation_steps is correct because you are already doing lesser number of gradient updates when usihng accumulation, so increasing the time it takes to reach true/peak LR does not make sense. I think lr_warmup_steps * num_processes is correct so that each rank can get equal-ish number of learning steps going from low to true/peak LR.

@Zephyrose
Copy link

i have the question too,why multiply num_processes?

@a-r-r-o-w
Copy link
Member

For why we multiply learning rate, there are many papers and recipes. For a quick read, you could look at the accelerate docs and linked references: https://huggingface.co/docs/accelerate/concept_guides/performance#learning-rates. There is also some older wisdom that scaling learning rate by sqrt(X) performs better, if X is the increase in batch size, but in different sets of experiments, scaling linear worked well too.

For why we multiple lr_scheduler warmup steps, there have been past discussions so I'll reference them here. Feel free to drop a comment if you don't find an explanation sufficiently reasonable: this, this, this and this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants