Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PMIx_Connect/PMIx_Disconnect process array ordering #414

Closed
jjhursey opened this issue Jul 11, 2022 · 19 comments · Fixed by #490
Closed

PMIx_Connect/PMIx_Disconnect process array ordering #414

jjhursey opened this issue Jul 11, 2022 · 19 comments · Fixed by #490

Comments

@jjhursey
Copy link
Member

The PMIx_Connect and PMIx_Disconnect operations take an array of procs. Does the procs array need to be:

  • A unique set of processes,
  • An identical list of processes with wildcard matching a full list of processes,
  • An identical list of processes with wildcard being different than a full list of processes,
pmix_status_t
PMIx_Connect(const pmix_proc_t procs[], size_t nprocs,
             const pmix_info_t info[], size_t ninfo)

PMIx Standard v4.1 Section 11.3.1 (page 183 lines 14-16)

A process can only engage in one connect operation involving the identical procs array at a time. However, a process can be simultaneously engaged in multiple connect operations, each involving a different procs array.

References

@jjhursey
Copy link
Member Author

Summary of the discussion during the July 11, 2022 IAWG teleconf

  • The identical procs array qualifier seems to indicate that the ordering of the array matters.
  • Looking only at the API, the instinct is that this is a unique set of unordered processes.
  • Would a namespace wildcard given by one process match with another process passing an enumerated list of all of the processes in that namespace?

@rhc54
Copy link
Member

rhc54 commented Jul 11, 2022

Identical means identical - wildcard matches wildcard and order counts. That has always been our definition - can/should add verbiage if required. You will probably find it elsewhere where arrays of procs are passed - I know we were explicit somewhere

@rhc54
Copy link
Member

rhc54 commented Jul 12, 2022

Here it is - from the PMIx_Connect section itself:

All processes engaged in a given PMIx_Connect operation must 
provide the identical procs array as ordering of entries in the array 
and the method by which those processes are identified (e.g., use of
PMIX_RANK_WILDCARD versus listing the individual processes) may
impact the host environment's algorithm for uniquely identifying an
operation.

It is provided as an advice to users section.

@jjhursey
Copy link
Member Author

Yeah, I think that's sufficiently clear as is.

Does this same note apply to some of the other places that take process arrays? PMIx_Fence was the one that came to mind, but maybe there are others in the API (in process sets/groups maybe).

@rhc54
Copy link
Member

rhc54 commented Jul 12, 2022

Yes - probably worth a look. Might also need to consider moving from advice to something stronger

@jjhursey
Copy link
Member Author

Some high-level notes from the July 14, 2022 PMIx Standard Meeting

  • Make the PMIX_RANK_WILDCARD text normative instead of as advice to users.
  • We need to enumerate all of the APIs that use process arrays
  • For each API, clarify that the process array is a unique ordered set (or if it isn't).
  • Consider lifting the ordering requirement
    • This would make it a unique unordered set of processes.
    • Dynamic workflows might need more flexibility in the ordering semantics.
    • This will likely have implications for RMs.
  • Consider matching PMIX_RANK_WILDCARD with a full list.
    • This will likely have implications for RMs.

@jjhursey
Copy link
Member Author

List of APIs that take a process array:

  • PMIx_Abort, pmix_server_abort_fn_t
  • PMIx_Fence, PMIx_Fence_nb, pmix_server_fencenb_fn_t
  • PMIx_Connect, PMIx_Connect_nb, pmix_server_connect_fn_t
  • PMIx_Disconnect, PMIx_Disconnect_nb, pmix_server_disconnect_fn_t
  • PMIx_Job_control, PMIx_Job_control_nb, pmix_server_job_control_fn_t
  • PMIx_Group_construct, PMIx_Group_construct_nb, pmix_server_grp_fn_t
  • PMIx_Group_invite, PMIx_Group_invite_nb
  • PMIx_IOF_pull, pmix_server_iof_fn_t
  • PMIx_IOF_push, pmix_server_stdin_fn_t

IAWG was going to go through this list on Monday (July 25).

@rhc54
Copy link
Member

rhc54 commented Jul 22, 2022

Of those, only the "fence", "connect/disconnect", and "group" functions involve collectives and are therefore currently order sensitive. The rest pay no attention to order as they are not collective operations. Probably should distinguish those somehow so that is clear.

@jjhursey
Copy link
Member Author

Some notes on the APIs from above. Document references from 'master' built today.

PMIx Abort

  • Section 12.1 : PMIx_Abort
    • Non-collective operation. Clear from text
    • Order does not form a signature that must match anyone else.
    • Already contains discussion of how the proc array is interpreted in comparison with NULL and PMIX_RANK_WILDCARD.
  • Section 17.3.5 : pmix_server_abort_fn_t
    • Identifies that the operation is the result of the client calling PMIx_Abort locally.
    • Already contains discussion of how the proc array is interpreted in comparison with NULL and PMIX_RANK_WILDCARD.
  • Summary I don't think any clarifications are needed here.

PMIx Fence

  • Section 8.1, 8.2 : PMIx_Fence, PMIx_Fence_nb
    • Collective
    • Allows passing NULL to indicate all processes in the namespace
    • Can pass PMIX_RANK_WILDCARD in one or more pmix_proc_t to indicate all processes in the namespace defined in that same structure.
    • No discussion of process ordering
  • Section 17.3.6: pmix_server_fencenb_fn_t
    • Discusses meaning of NULL array
    • No discussion of process ordering
  • Summary The Fence section on both the client and server sides needs language on ordering.

PMIx Connect

  • Section 12.3.1, 12.3.2 : PMIx_Connect, PMIx_Connect_nb
    • Collective
    • Clear (per discussion in this Issue) with involving the identical procs array at a time and Advice to users.
  • Section 17.3.12 : pmix_server_connect_fn_t
    • No mention of the unique ordering of the process array.
  • Summary Clarify proc array ordering in the server side function.

PMIx Disconnect

  • Section 12.3.3, 12.3.4 : PMIx_Disconnect, PMIx_Disconnect_nb
    • Collective
    • Clear with involving the identical procs array at a time and Advice to users.
    • Language is similar to Connect
  • Section 17.3.13 : pmix_server_disconnect_fn_t
    • No mention of the unique ordering of the process array.
  • Summary Clarify proc array ordering in the server side function.

PMIx Job Control

  • Section 13.2.1, 13.2.2 : PMIx_Job_control, PMIx_Job_control_nb
    • Non-collective operation. That might need clarification.
    • Notes the role of NULL and PMIX_RANK_WILDCARD in the operation.
    • The const pmix_proc_t targets[] is "the processes to which the requested job control action is to be applied". For example, checkpoint this list of processes. Those targets need not be involved in making this call.
  • Section 17.3.22 : pmix_server_job_control_fn_t
    • Notes the role of NULL and PMIX_RANK_WILDCARD in the operation.
    • Non-collective behavior is implied.
  • Summary Maybe look to clarify that this operation is non-collective on the client side.

PMIx Group Construct

  • Section 14.2.6, 14.2.7 : PMIx_Group_construct, PMIx_Group_construct_nb
    • Collective
    • Advice to PMIx server hosts notes: Host environments that utilize the array of process participants as a signature ..."
    • Otherwise no language on the ordering or structure of the process array.
  • Section 17.3.29 : pmix_server_grp_fn_t
    • No language on the ordering or structure of the process array.
  • Summary The group construct section on both the client and server sides needs language on ordering.

PMIx Group Invite

  • Section 14.2.10,14.2.11 : PMIx_Group_invite, PMIx_Group_invite_nb
    • Non-collective.
    • procs array is the list of processes to invite to the group.
    • The caller is identified as the leader of the new group. The set of processes listed receives an event indicating that they have been invited. Those processes call PMIx_Group_join to accept or decline the invitation.
  • No server-side function
    • The client-side calls PMIx_Notify_event to send out the notifications.
  • Summary Does the ordering of the procs have a meeting to the group (meaning is the group ordered according to the procs array, or does a group have no explicit order)?

PMIx IO Forwarding Pull

  • Section 18.5.7 : PMIx_IOF_pull
    • Non-collective
    • Used by a tool to identify the set of processes from which to pull stdout/stderr.
    • Use PMIX_RANK_WILDCARD noted in an advice to users
    • No mention of the meaning of NULL. I assume that NULL is not allowed given its meaning elsewhere. But we should clarify.
  • Section 17.3.27 : pmix_server_iof_fn_t
    • Implied that this non-collective.
  • Summary Could use a statement on the client side that the proc order does not matter. On the server side it may be good to note that this is the server-side callback from the client-side call to PMIx_IOF_pull. Note that NULL cannot be used in the procs array

PMIx IO Forwarding Push

  • Section 18.5.9 : PMIx_IOF_push
    • Non-collective
    • Used by a tool to identify the set of processes to target for stdin
    • Use PMIX_RANK_WILDCARD noted in an advice to users
    • No mention of the meaning of NULL. I assume that NULL is not allowed given its meaning elsewhere. But we should clarify.
  • Section : pmix_server_stdin_fn_t
    • Implied that this non-collective.
  • Summary Could use a statement on the client side that the proc order does not matter. On the server side it may be good to note that this is the server-side callback from the client-side call to PMIx_IOF_push. Note that NULL cannot be used in the procs array

@jjhursey
Copy link
Member Author

Per IAWG July 25, 2022:

  • We will work on a PR to clarify the text in the identified items.
  • Thinking that this would be an Errata type ticket.

@jjhursey
Copy link
Member Author

jjhursey commented Aug 8, 2022

Per OpenPMIx ticket below, they now sort the array before the collective operation.

@rhc54
Copy link
Member

rhc54 commented Aug 8, 2022

Yeah, I made the change after checking with Cray/HPE (who sorts the arrays prior to use) and Slurm (who does not sort, but only supports the "fence" operation across the primary job - which means that there is only one entry in the array anyway). It seems that others have been encountering the problem, and so a scenario is emerging where libraries are sorting their arrays prior to calling PMIx collectives, and then backend support is also sorting the arrays prior to using them - both doing so to avoid the problem of mistakenly thinking that two separate collectives were being executed.

Seems that the common element is use of a PMIx collective operation, so it is probably more efficient for PMIx to do the sort and simply guarantee order independence to both sides (the client and the host).

@jjhursey jjhursey added this to the PMIx v4.2 Standard milestone Aug 9, 2022
@jjhursey
Copy link
Member Author

jjhursey commented Aug 9, 2022

Per discussion in 3Q 2022 meeting notes link

  • The general feeling was: We should address this soon with an errata making the order independent.

@dsolt
Copy link
Contributor

dsolt commented Jun 5, 2023

Should the restriction that processes be specified in the same way (RANK_WILDCARD vs listing individual procs) be associated with PMIx_Group_construct? It seems that for this operation, the grp name is the ultimate determination of matching, not the list of processes? Perhaps the standard should say what should happen if two or more processes call PMIX_Group_construct[_nb] with the same grp name, but incompatible proc arrays, but that is another issue. However, the argument that the server needs the processes specified in exactly the same way seems to lack motivation for PMIX_Group_construct where the grp name can be used to determine matching. Not sure though... any thoughts?

@rhc54
Copy link
Member

rhc54 commented Jun 8, 2023

You don't need to specify them in the same way, but the list needs to include the same procs. A participant can add procs that are not in the list (e.g., that only this participant knows about) using the PMIX_GROUP_ADD_MEMBERS attribute. Final group membership is determined solely from the returned PMIX_GROUP_MEMBERSHIP attribute at the conclusion of the "construct" operation, which is ordered by the assigned group rank.

@dsolt
Copy link
Contributor

dsolt commented Jun 26, 2023

Will create PR for this shortly based on @rhc54 comment above (the rest of the PR has been finished). Will not make next quarterly. I recommend that we do not hold up 4.2 specifically for this issue though.

@dsolt
Copy link
Contributor

dsolt commented Jul 31, 2023

Getting off topic here, but do the group identifiers need to be globally unique across the whole PMIx session or do they only need to be unique for the given set of processes included. I.e. if my namespace FOO decided to create a group including all members of FOO and BAR and calls it "mygroup" and then another user has two namespaces BIZ and BAZ and they decide to create a group called "mygroup", is that allowed?

@dsolt
Copy link
Contributor

dsolt commented Jul 31, 2023

See pr #490

@rhc54
Copy link
Member

rhc54 commented Jul 31, 2023

It's a tough question. We "hold" group membership at the client library level - i.e., the client translates any group references to the corresponding namespace/rank before passing the ID on to whatever API has been called. So one could argue that so long as a given client sees a unique groupID-to-membership correlation, then all is good.

The problem comes when a given process belongs to two different groups, both named "mygroup" - now there is no clear way to translate that membership when someone refers to [mygroup, grprank].

This is why we originally required the groupID be unique across a session. However, this presumed that "groups" could not be formed across sessions - which people were quick to point out as an undesirable limitation! So we actually broadened the requirement to state that the ID must be unique in the "universe" - i.e., within the view of the host environment. Kinda tricky as you don't know (and cannot control) what some other user in some other session is doing. An org could set a policy that everyone prefix (or suffix) a group name with their sessionID(s) to gain some protection - but that requires a policy, which obviously lies outside the province of PMIx. Also requires that apps doing cross-session connections agree on how the final groupID gets constructed (e.g., which sessionID goes first).

Not perfect, but at least something. To be sure, we haven't hit any cases where that strategy wasn't sufficient as we aren't seeing cross-session group formation (at least, so far as anyone has reported). I'm sure one could imagine a corner case that would encounter a problem, but it is hard to come up with something that is (a) absolutely safe and (b) not overly burdensome. One possibility was to use the group membership (expressed as a string) as the ID, but that's somewhat klunky and not all that attractive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants