Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 414: ordering & method of specifying procs in collectives clarified #490

Merged
merged 2 commits into from
Mar 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 13 additions & 7 deletions Chap_API_Proc_Mgmt.tex
Original file line number Diff line number Diff line change
Expand Up @@ -827,9 +827,12 @@ \subsection{\code{PMIx_Connect}}

As in the case of the \refapi{PMIx_Fence} operation, the \refarg{info} array can be used to pass user-level directives regarding timeout constraints and other options available from the host \ac{RM}.

\adviceuserstart
All processes engaged in a given \refapi{PMIx_Connect} operation must provide the identical \refarg{procs} array as ordering of entries in the array and the method by which those processes are identified (e.g., use of \refconst{PMIX_RANK_WILDCARD} versus listing the individual processes) \textit{may} impact the host environment's algorithm for uniquely identifying an operation.
\adviceuserend
Each provided \refstruct{pmix_proc_t} struct can pass \refconst{PMIX_RANK_WILDCARD} to indicate that all processes in the given namespace are participating.
The ordering of the entries in the \refarg{procs} has no significance. However, all processes engaged in a given
\refapi{PMIx_Connect}
operation must use the same method to identify processes. Callers which describe
the target set of processes using PMIX_RANK_WILDCARD will not be matched with
callers which list the individual processes of a namespace explicitly.

\adviceimplstart
\refapi{PMIx_Connect} and its non-blocking form are both \emph{collective} operations. Accordingly, the \ac{PMIx} server library is required to aggregate participation by local clients, passing the request to the host environment once all local participants have executed the \ac{API}.
Expand Down Expand Up @@ -896,7 +899,7 @@ \subsection{\code{PMIx_Connect_nb}}
%%%%
\descr

Nonblocking version of \refapi{PMIx_Connect}. The callback function is called once all processes identified in \refarg{procs} have called either \refapi{PMIx_Connect} or its non-blocking version, \textit{and} the host environment has completed any supporting operations required to meet the terms of the \ac{PMIx} definition of \textit{connected} processes. See the advice provided in the description for \refapi{PMIx_Connect} for more information.
Nonblocking version of \refapi{PMIx_Connect}. The callback function is called once all processes identified in \refarg{procs} have called either \refapi{PMIx_Connect} or its non-blocking version, \textit{and} the host environment has completed any supporting operations required to meet the terms of the \ac{PMIx} definition of \textit{connected} processes. See the description of \refapi{PMIx_Connect} for more information.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Expand Down Expand Up @@ -957,9 +960,12 @@ \subsection{\code{PMIx_Disconnect}}

As in the case of the \refapi{PMIx_Fence} operation, the \refarg{info} array can be used to pass user-level directives regarding the algorithm to be used for any collective operation involved in the operation, timeout constraints, and other options available from the host \ac{RM}.

\adviceuserstart
All processes engaged in a given \refapi{PMIx_Disconnect} operation must provide the identical \refarg{procs} array as ordering of entries in the array and the method by which those processes are identified (e.g., use of \refconst{PMIX_RANK_WILDCARD} versus listing the individual processes) \textit{may} impact the host environment's algorithm for uniquely identifying an operation.
\adviceuserend
Each provided \refstruct{pmix_proc_t} struct can pass \refconst{PMIX_RANK_WILDCARD} to indicate that all processes in the given namespace are participating.
The ordering of the entries in the \refarg{procs} has no significance. However, all processes engaged in a given
\refapi{PMIx_Disconnect}
operation must use the same method to identify processes. Callers which describe
the target set of processes using PMIX_RANK_WILDCARD will not be matched with
callers which list the individual processes of a namespace explicitly.

\adviceimplstart
\refapi{PMIx_Disconnect} and its non-blocking form are both \emph{collective} operations. Accordingly, the \ac{PMIx} server library is required to aggregate participation by local clients, passing the request to the host environment once all local participants have executed the \ac{API}.
Expand Down
10 changes: 8 additions & 2 deletions Chap_API_Sets_Groups.tex
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,7 @@ \subsection{Construction procedure}

\refapi{PMIx_Connect} calls require that every process call the \ac{API} before completing – i.e., it is modeled upon the bulk synchronous traditional \ac{MPI} connect/accept methodology. Thus, a given application thread can only be involved in one connect/accept operation at a time, and is blocked in that operation until all specified processes participate. In addition, there is no provision for replacing processes in the assemblage due to failure to participate, nor a mechanism by which a process might decline participation.

In contrast, \ac{PMIx} Groups are designed to be more flexible in their construction procedure by relaxing these constraints. While a standard blocking form of constructing groups is provided, the event notification system is utilized to provide a designated \emph{group leader} with the ability to replace participants that fail to participate within a given timeout period. This provides a mechanism by which the application can, if desired, replace members on-the-fly or allow the group to proceed with partial membership. In such cases, the final group membership is returned to all participants upon completion of the operation.
In contrast, \ac{PMIx} Groups are designed to be more flexible in their construction procedure by relaxing these constraints. While a standard collective, blocking form of constructing groups is provided, the event notification system is utilized to provide a designated \emph{group leader} with the ability to replace participants that fail to participate within a given timeout period. This provides a mechanism by which the application can, if desired, replace members on-the-fly or allow the group to proceed with partial membership. In such cases, the final group membership is returned to all participants upon completion of the operation.

Additionally, \ac{PMIx} supports dynamic definition of group membership based on an invite/join model. A process can asynchronously initiate construction of a group of any processes via the \refapi{PMIx_Group_invite} function call. Invitations are delivered via a \ac{PMIx} event (using the \refconst{PMIX_GROUP_INVITED} event) to the invited processes which can then either accept or decline the invitation using the \refapi{PMIx_Group_join} \ac{API}. The initiating process tracks responses by registering for the events generated by the call to \refapi{PMIx_Group_join}, timeouts, or process terminations, optionally replacing processes that decline the invitation, fail to respond in time, or terminate without responding. Upon completion of the operation, the final list of participants is communicated to each member of the new group.

Expand Down Expand Up @@ -361,6 +361,12 @@ \subsection{\code{PMIx_Group_construct}}

Construct a new group composed of the specified processes and identified with the provided group identifier. The group identifier is a user-defined, \code{NULL}-terminated character array of length less than or equal to \refconst{PMIX_MAX_NSLEN}. Only characters accepted by standard string comparison functions (e.g., \emph{strncmp}) are supported. Processes may engage in multiple simultaneous group construct operations so long as each is provided with a unique group ID. The \refarg{directives} array can be used to pass user-level directives regarding timeout constraints and other options available from the \ac{PMIx} server.

Each provided \refstruct{pmix_proc_t} struct can pass \refconst{PMIX_RANK_WILDCARD} to indicate that all processes in the given namespace are participating.
The ordering of the entries in the \refarg{procs} has no significance, neither is the method used for identifying processes.
Some callers may describe the target set of processes using PMIX_RANK_WILDCARD while other
callers may list the individual processes of a namespace explicitly.
This is unlike \refapi{PMIx_Connect}, because in this group operation, the group name \refarg{grp} is used to determine the uniqueness of the collective.

If the \refattr{PMIX_GROUP_NOTIFY_TERMINATION} attribute is provided and has a value of \code{true}, then either the construct leader (if \refattr{PMIX_GROUP_LEADER} is provided) or all participants who register for the \refconst{PMIX_GROUP_MEMBER_FAILED} event will receive events whenever a process fails or terminates prior to calling \refapi{PMIx_Group_construct} – i.e. if a \emph{group leader} is declared, \textit{only} that process will receive the event. In the absence of a declared leader, \textit{all} specified group members will receive the event.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a brief comment that "Because the name is used to determine the uniqueness of the collective...."


The event will contain the identifier of the process that failed to join plus any other information that the host \ac{RM} provided. This provides an opportunity for the leader or the collective members to react to the event – e.g., to decide to proceed with a smaller group or to abort the operation. The decision is communicated to the \ac{PMIx} library in the results array at the end of the event handler. This allows \ac{PMIx} to properly adjust accounting for procedure completion. When construct is complete, the participating \ac{PMIx} servers will be alerted to any change in participants and each group member will receive an updated group membership (marked with the \refattr{PMIX_GROUP_MEMBERSHIP} attribute) as part of the \refarg{results} array returned by this \ac{API}.
Expand All @@ -380,7 +386,7 @@ \subsection{\code{PMIx_Group_construct}}
\adviceimplend

\advicermstart
The collective nature of this \ac{API} generally results in use of a fence-like operation by the backend host environment. Host environments that utilize the array of process participants as a \emph{signature} for such operations may experience potential conflicts should both a \refapi{PMIx_Group_construct} and a \refapi{PMIx_Fence} operation involving the same participants be simultaneously executed. As \ac{PMIx} allows for such use-cases, it is therefore the responsibility of the host environment to resolve any potential conflicts.
The collective nature of this \ac{API} generally results in use of a fence-like operation by the backend host environment. Host environments that utilize the array of process participants as a \emph{signature} for such operations may experience potential conflicts should both a \refapi{PMIx_Group_construct} and a \refapi{PMIx_Fence} operation involving the same participants be simultaneously executed. As \ac{PMIx} allows for such use-cases, it is therefore the responsibility of the host environment to resolve any potential conflicts. Using the group name as part of the \emph{signature} may provide a better unique point of coordination. Also be aware that the process participants can be specified in different ways by each caller (e.g. some callers may include all members of a namespace using \refconst{PMIX_RANK_WILDCARD} while others may list each process individually).
\advicermend

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Expand Down
5 changes: 5 additions & 0 deletions Chap_API_Sync.tex
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,11 @@ \section{\code{PMIx_Fence}}

Passing a \code{NULL} pointer as the \refarg{procs} parameter indicates that the fence is to span all processes in the client's namespace.
Each provided \refstruct{pmix_proc_t} struct can pass \refconst{PMIX_RANK_WILDCARD} to indicate that all processes in the given namespace are participating.
The ordering of the entries in the \refarg{procs} has no significance. However, all processes engaged in a given
\refapi{PMIx_Fence}
operation must use the same method to identify processes. Callers which describe
the target set of processes using PMIX_RANK_WILDCARD will not be matched with
callers which list the individual processes of a namespace explicitly.

The \refarg{info} array is used to pass user directives regarding the behavior of the fence operation. Note that for scalability reasons, the default behavior for \refapi{PMIx_Fence} is to not collect data posted by the operation's participants.

Expand Down