Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose all torch.distributed.init_process_group parameters in the DistributedManager #690

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

akshaysubr
Copy link
Collaborator

@akshaysubr akshaysubr commented Oct 12, 2024

Modulus Pull Request

Description

Adding kwargs to DistributedManager.initialize to pass down to torch.distributed.init_process_group. Added a test to specifically check that the timeout parameter gets passed down to torch.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • The CHANGELOG.md is up to date with these changes.
  • An issue is linked to this pull request.

Dependencies

None

Copy link
Collaborator

@azrael417 azrael417 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. It is good to allow using keyword arguments because that is more flexible. Thanks for the quick turnaround on this.

@azrael417
Copy link
Collaborator

How does it behave if you pass a kwarg which has already been passed explicitly, for example rank or world_size? Will that overwrite the previous one?

@akshaysubr
Copy link
Collaborator Author

That's a good point. Maybe should pop the explicitly specified kwargs out before passing them down?

@mnabian
Copy link
Collaborator

mnabian commented Nov 6, 2024

@akshaysubr do we want to merge this PR before the release?

@akshaysubr
Copy link
Collaborator Author

@mnabian Yes, we should merge this before the release. This is a fairly low risk PR I think but exposes certain mechanisms for more advanced usage. I think we can merge this as is and add other functionality that come up in subsequent PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants