You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are following the FSDP example and trying to understand the mechanism behind how different microbatches are assigned to each rank during training, and specifically the role of the global_rank variable in this process.
In the code, it appears that global_rank is used as a seed for dataset shuffling, as shown below:
However, we encountered a few uncertainties regarding the initialization of global_rank and how it ensures non-overlapping data across ranks.
Questions:
Initialization of global_rank:
Is global_rank meant to be passed as an argument, or is it inferred from the environment (e.g., the rank in distributed training)?
Shuffling and Data Partitioning:
How does shuffling with global_rank ensure that different ranks receive different, non-overlapping samples? While the shuffling function modifies the random seed using global_rank, it's unclear how this alone guarantees distinct data across ranks without overlap.
Use of DistributedSampler:
In the current example, the DataLoader does not use a DistributedSampler, which is typically utilized to partition datasets across ranks. The DataLoader setup looks like this:
Is there any additional mechanism beyond shuffling (e.g., use of a DistributedSampler) that ensures non-overlapping data across ranks? Should we consider adding a DistributedSampler in this case?
Request:
Could you provide clarification on:
The intended role and correct initialization of global_rank.
How microbatches are distributed across ranks, especially in the absence of a DistributedSampler.
Any guidance on how to avoid potential overlap in samples across different ranks would be greatly appreciated.
The text was updated successfully, but these errors were encountered:
@purefall thanks for reporting the issue, we are working on improving this example. Currently dataloading code in def create_streaming_dataloader is a mock that is not designed for production use. Answering your questions:
In general FSDP dataloading setup should look like this:
Context:
We are following the FSDP example and trying to understand the mechanism behind how different microbatches are assigned to each rank during training, and specifically the role of the
global_rank
variable in this process.In the code, it appears that
global_rank
is used as a seed for dataset shuffling, as shown below:However, we encountered a few uncertainties regarding the initialization of
global_rank
and how it ensures non-overlapping data across ranks.Questions:
Initialization of
global_rank
:global_rank
meant to be passed as an argument, or is it inferred from the environment (e.g., the rank in distributed training)?Shuffling and Data Partitioning:
global_rank
ensure that different ranks receive different, non-overlapping samples? While the shuffling function modifies the random seed usingglobal_rank
, it's unclear how this alone guarantees distinct data across ranks without overlap.Use of DistributedSampler:
In the current example, the DataLoader does not use a
DistributedSampler
, which is typically utilized to partition datasets across ranks. The DataLoader setup looks like this:DistributedSampler
) that ensures non-overlapping data across ranks? Should we consider adding aDistributedSampler
in this case?Request:
Could you provide clarification on:
global_rank
.DistributedSampler
.Any guidance on how to avoid potential overlap in samples across different ranks would be greatly appreciated.
The text was updated successfully, but these errors were encountered: