-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pytorch parallelism #5916
base: main
Are you sure you want to change the base?
Pytorch parallelism #5916
Conversation
Hey! @andersooi Please correct the file path. This is the correct path |
@@ -0,0 +1,104 @@ | |||
--- | |||
Title: 'Distributed Data Parallelism' | |||
Description: 'An overview of distributed data parallelism in PyTorch.' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description is too generic. Explain what distributed data parallelism is within 1-2 lines.
https://github.com/Codecademy/docs/blob/main/documentation/style-guide.md
- 'PyTorch' | ||
- 'Data' | ||
- 'Data Parallelism' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- 'PyTorch' | |
- 'Data' | |
- 'Data Parallelism' | |
- 'Data' | |
- 'Data Parallelism' | |
- 'PyTorch' |
|
||
## Introduction to Distributed Data Parallelism | ||
|
||
Distributed Data Parallelism (DDP) in PyTorch is a module that enables users to efficiently train models across multiple GPUs and machines. By splitting the training process across multiple machines, DDP helps reduce training time and facilitates scaling to larger models and datasets. It achieves parallelism by splitting the input data into smaller chunks, processing them on different GPUs, and aggregating results for updates. Compared to `DataParallel`, DDP offers better performance and scalability by minimising device communication overhead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Distributed Data Parallelism (DDP) in PyTorch is a module that enables users to efficiently train models across multiple GPUs and machines. By splitting the training process across multiple machines, DDP helps reduce training time and facilitates scaling to larger models and datasets. It achieves parallelism by splitting the input data into smaller chunks, processing them on different GPUs, and aggregating results for updates. Compared to `DataParallel`, DDP offers better performance and scalability by minimising device communication overhead. | |
Distributed Data Parallelism (DDP) in PyTorch is a module that enables users to efficiently train models across multiple GPUs and machines. By splitting the training process across multiple machines, DDP helps reduce training time and facilitates scaling to larger models and datasets. | |
It achieves parallelism by splitting the input data into smaller chunks, processing them on different GPUs, and aggregating results for updates. Compared to `DataParallel`, DDP offers better performance and scalability by minimizing device communication overhead. |
|
||
To use DDP, a distributed process group needs to be initialised and wrapped to a model with `torch.nn.parallel.DistributedDataParallel`. | ||
|
||
```py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
```py | |
```pseudo |
def setup(rank, world_size): | ||
os.environ['MASTER_ADDR'] = 'localhost' | ||
os.environ['MASTER_PORT'] = '8000' | ||
dist.init_process("nccl", rank=rank, world_size=world_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dist.init_process("nccl", rank=rank, world_size=world_size) | |
dist.init_process_group("nccl", rank=rank, world_size=world_size) |
Hey! @andersooi, Thank you for contributing to Codecademy docs. I have made a few suggestions; please go through them and make the necessary changes. |
Description
Issue Solved
Closes #5871
Type of Change
Checklist
main
branch.Issues Solved
section.