forked from pmix/pmix-standard
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathChap_API_Event.tex
283 lines (205 loc) · 20.1 KB
/
Chap_API_Event.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Chapter: Events
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\chapter{Event Notification}
\label{chap:api_event}
This chapter defines the \ac{PMIx} event notification system.
These interfaces are designed to support the reporting of events to/from clients and servers, and between library layers within a single process.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Notification and Management}
\label{chap:api_event:notify}
\ac{PMIx} event notification provides an asynchronous out-of-band mechanism for communicating events between application processes and/or elements of the \ac{SMS}. Its uses span a wide range that includes fault notification, coordination between multiple programming libraries within a single process, and workflow orchestration for non-synchronous programming models. Events can be divided into two distinct classes:
\begin{itemize}
\item \textit{Job-specific events} directly relate to a job executing within the session, such as a debugger attachment, process failure within a related job, or events generated by an application process. Events in this category are to be immediately delivered to the \ac{PMIx} server library for relay to the related local processes.
\item \textit{Environment events} indirectly relate to a job but do not specifically target the job itself. This category includes \ac{SMS}-generated events such as \ac{ECC} errors, temperature excursions, and other non-job conditions that might directly affect a session's resources, but would never include an event generated by an application process. Note that although these do potentially impact the session's jobs, they are not directly tied to those jobs. Thus, events in this category are to be delivered to the \ac{PMIx} server library only upon request.
\end{itemize}
Both \ac{SMS} elements and applications can register for events of either type.
\adviceimplstart
Race conditions can cause the registration to come after events of possible interest (e.g., a memory \ac{ECC} event that occurs after start of execution but prior to registration, or an application process generating an event prior to another process registering to receive it). \ac{SMS} vendors are \textit{requested} to cache environment events for some time to mitigate this situation, but are not \textit{required} to do so. However, \ac{PMIx} implementers are \textit{required} to cache all events received by the \ac{PMIx} server library and to deliver them to registering clients in the same order in which they were received
\adviceimplend
\adviceuserstart
Applications must be aware that they may not receive environment events that occur prior to registration, depending upon the capabilities of the host \ac{SMS}.
\adviceuserend
The generator of an event can specify the \textit{target range} for delivery of that event. Thus, the generator can choose to limit notification to processes on the local node, processes within the same job as the generator, processes within the same allocation, other threads within the same process, only the \ac{SMS} (i.e., not to any application processes), all application processes, or to a custom range based on specific process identifiers. Only processes within the given range that register for the provided event code will be notified. In addition, the generator can use attributes to direct that the event not be delivered to any default event handlers, or to any multi-code handler (as defined below).
Event notifications provide the process identifier of the source of the event plus the event code and any additional information provided by the generator. When an event notification is received by a process, the registered handlers are scanned for their event code(s), with matching handlers assembled into an \textit{event chain} for servicing. Note that users can also specify a \textit{source range} when registering an event (using the same range designators described above) to further limit when they are to be invoked. When assembled, PMIx event chains are ordered based on both the specificity of the event handler and user directives at time of handler registration. By default, handlers are grouped into three categories based on the number of event codes that can trigger the callback:
\begin{itemize}
\item \textit{single-code} handlers are serviced first as they are the most specific. These are handlers that are registered against one specific event code.
\item \textit{multi-code} handlers are serviced once all single-code handlers have completed. The handler will be included in the chain upon receipt of an event matching any of the provided codes.
\item \textit{default} handlers are serviced once all multi-code handlers have completed. These handlers are always included in the chain unless the generator specifically excludes them.
\end{itemize}
Users can specify the callback order of a handler within its category at the time of registration. Ordering can be specified either by providing the relevant returned event handler registration ID or using event handler names, if the user specified an event handler name when registering the corresponding event. Thus, users can specify that a given handler be executed before or after another handler should both handlers appear in an event chain (the ordering is ignored if the other handler isn't included). Note that ordering does not imply immediate relationships. For example, multiple handlers registered to be serviced after event handler \textit{A} will all be executed after \textit{A}, but are not guaranteed to be executed in any particular order amongst themselves.
In addition, one event handler can be declared as the \textit{first} handler to be executed in the chain. This handler will \textit{always} be called prior to any other handler, regardless of category, provided the incoming event matches both the specified range and event code. Only one handler can be so designated --- attempts to designate additional handlers as \textit{first} will return an error. Deregistration of the declared \textit{first} handler will re-open the position for subsequent assignment.
Similarly, one event handler can be declared as the \textit{last} handler to be executed in the chain. This handler will \textit{always} be called after all other handlers have executed, regardless of category, provided the incoming event matches both the specified range and event code. Note that this handler will not be called if the chain is terminated by an earlier handler. Only one handler can be designated as \textit{last} --- attempts to designate additional handlers as \textit{last} will return an error. Deregistration of the declared \textit{last} handler will re-open the position for subsequent assignment.
\adviceuserstart
Note that the \textit{last} handler is called \textit{after} all registered default handlers that match the specified range of the incoming event unless a handler prior to it terminates the chain. Thus, if the application intends to define a \textit{last} handler, it should ensure that no default handler aborts the process before it.
\adviceuserend
Upon completing its work and prior to returning, each handler \textit{must} call the event handler completion function provided when it was invoked (including a status code plus any information to be passed to later handlers) so that the chain can continue being progressed. \ac{PMIx} automatically aggregates the status and any results of each handler (as provided in the completion callback) with status from all prior handlers so that each step in the chain has full knowledge of what preceded it. An event handler can terminate all further progress along the chain by passing the \refconst{PMIX_EVENT_ACTION_COMPLETE} status to the completion callback function.
%%%%%%%%%%%
\subsection{\code{PMIx_Register_event_handler}}
\declareapi{PMIx_Register_event_handler}
%%%%
\summary
Register an event handler
%%%%
\format
\versionMarker{2.0}
\cspecificstart
\begin{codepar}
pmix_status_t
PMIx_Register_event_handler(pmix_status_t codes[], size_t ncodes,
pmix_info_t info[], size_t ninfo,
pmix_notification_fn_t evhdlr,
pmix_evhdlr_reg_cbfunc_t cbfunc,
void *cbdata);
\end{codepar}
\cspecificend
\begin{arglist}
\argin{codes}{Array of status codes (array of \refstruct{pmix_status_t})}
\argin{ncodes}{Number of elements in the \refarg{codes} array (\code{size_t})}
\argin{info}{Array of info structures (array of handles)}
\argin{ninfo}{Number of elements in the \refarg{info} array (\code{size_t})}
\argin{evhdlr}{Event handler to be called \refapi{pmix_notification_fn_t} (function reference)}
\argin{cbfunc}{Callback function \refapi{pmix_evhdlr_reg_cbfunc_t} (function reference)}
\argin{cbdata}{Data to be passed to the cbfunc callback function (memory reference)}
\end{arglist}
If \refarg{cbfunc} is \code{NULL}, the function call will be treated as a \emph{blocking} call. In this case, the returned status will be either (a) the event handler reference identifier if the value is greater than or equal to zero, or (b) a negative error code indicative of the reason for the failure.
If the \refarg{cbfunc} is non-\code{NULL}, the function call will be treated as a \emph{non-blocking} call and will return the following:
\begin{constantdesc}
\item \refconst{PMIX_SUCCESS} indicating that the request has been accepted for processing and the provided callback function will be executed upon completion of the operation. Note that the library must not invoke the callback function prior to returning from the \ac{API}. The event handler identifier will be returned in the callback
\item a non-zero \ac{PMIx} error constant indicating a reason for the request to have been rejected. In this case, the provided callback function will not be executed.
\end{constantdesc}
The callback function must not be executed prior to returning from the \ac{API}, and no events corresponding to this registration may be delivered prior to the completion of the registration callback function (\refarg{cbfunc}).
\reqattrstart
The following attributes are required to be supported by all \ac{PMIx} libraries:
\pastePRIAttributeItem{PMIX_EVENT_HDLR_NAME}
\pastePRIAttributeItem{PMIX_EVENT_HDLR_FIRST}
\pastePRIAttributeItem{PMIX_EVENT_HDLR_LAST}
\pastePRIAttributeItem{PMIX_EVENT_HDLR_FIRST_IN_CATEGORY}
\pastePRIAttributeItem{PMIX_EVENT_HDLR_LAST_IN_CATEGORY}
\pastePRIAttributeItem{PMIX_EVENT_HDLR_BEFORE}
\pastePRIAttributeItem{PMIX_EVENT_HDLR_AFTER}
\pastePRIAttributeItem{PMIX_EVENT_HDLR_PREPEND}
\pastePRIAttributeItem{PMIX_EVENT_HDLR_APPEND}
\pastePRIAttributeItem{PMIX_EVENT_CUSTOM_RANGE}
\pastePRIAttributeItem{PMIX_RANGE}
\pastePRIAttributeItem{PMIX_EVENT_RETURN_OBJECT}
\divider
Host environments that implement support for \ac{PMIx} event notification are required to support the following attributes:
\pastePRRTEAttributeItem{PMIX_EVENT_AFFECTED_PROC}
\pastePRRTEAttributeItem{PMIX_EVENT_AFFECTED_PROCS}
\reqattrend
\optattrstart
Host environments that support \ac{PMIx} event notification \textit{may} offer notifications for environmental events impacting the job and for \ac{SMS} events relating to the job. The following attributes are optional for host environments that suppport this operation:
\pastePRRTEAttributeItem{PMIX_EVENT_TERMINATE_SESSION}
\pastePRRTEAttributeItem{PMIX_EVENT_TERMINATE_JOB}
\pastePRRTEAttributeItem{PMIX_EVENT_TERMINATE_NODE}
\pasteAttributeItem{PMIX_EVENT_TERMINATE_PROC}
\pastePRRTEAttributeItem{PMIX_EVENT_ACTION_TIMEOUT}
\pastePRRTEAttributeItem{PMIX_EVENT_SILENT_TERMINATION}
\optattrend
%%%%
\descr
Register an event handler to report events. Note that the codes being registered do \textit{not} need to be \ac{PMIx} error constants --- any integer value can be registered. This allows for registration of non-PMIx events such as those defined by a particular \ac{SMS} vendor or by an application itself.
\adviceuserstart
In order to avoid potential conflicts, users are advised to only define codes that lie outside the range of the \ac{PMIx} standard's error codes. Thus, \ac{SMS} vendors and application developers should constrain their definitions to positive values or negative values beyond the \refconst{PMIX_EXTERNAL_ERR_BASE} boundary.
\adviceuserend
\adviceuserstart
As previously stated, upon completing its work, and prior to returning, each handler \textit{must} call the event handler completion function provided when it was invoked (including a status code plus any information to be passed to later handlers) so that the chain can continue being progressed. An event handler can terminate all further progress along the chain by passing the \refconst{PMIX_EVENT_ACTION_COMPLETE} status to the completion callback function. Note that the parameters passed to the event handler (e.g., the \refarg{info} and \refarg{results} arrays) will cease to be valid once the completion function has been called - thus, any information in the incoming parameters that will be referenced following the call to the completion function must be copied.
\adviceuserend
%%%%%%%%%%%
\subsection{\code{PMIx_Deregister_event_handler}}
\declareapi{PMIx_Deregister_event_handler}
%%%%
\summary
Deregister an event handler.
%%%%
\format
\versionMarker{2.0}
\cspecificstart
\begin{codepar}
pmix_status_t
PMIx_Deregister_event_handler(size_t evhdlr_ref,
pmix_op_cbfunc_t cbfunc,
void *cbdata);
\end{codepar}
\cspecificend
\begin{arglist}
\argin{evhdlr_ref}{Event handler ID returned by registration (\code{size_t})}
\argin{cbfunc}{Callback function to be executed upon completion of operation \refapi{pmix_op_cbfunc_t} (function reference)}
\argin{cbdata}{Data to be passed to the cbfunc callback function (memory reference)}
\end{arglist}
If \refarg{cbfunc} is \code{NULL}, the function will be treated as a \emph{blocking} call and the result of the operation returned in the status code.
If \refarg{cbfunc} is non-\code{NULL}, the function will be treated as a \emph{non-blocking} call and return one of the following:
\begin{itemize}
\item \refconst{PMIX_SUCCESS}, indicating that the request is being processed - result will be returned in the provided \refarg{cbfunc}. Note that the library must not invoke the callback function prior to returning from the \ac{API}.
\item \refconst{PMIX_OPERATION_SUCCEEDED}, indicating that the request was immediately processed and returned \textit{success} - the \refarg{cbfunc} will \textit{not} be called
\item a PMIx error constant indicating either an error in the input or that the request was immediately processed and failed - the \refarg{cbfunc} will \textit{not} be called
\end{itemize}
The returned status code will be one of the following:
\begin{constantdesc}
\item \refconst{PMIX_SUCCESS} The event handler was successfully deregistered.
\item \refconst{PMIX_ERR_BAD_PARAM} The provided \refarg{evhdlr_ref} was unrecognized.
\item \refconst{PMIX_ERR_NOT_SUPPORTED} The \ac{PMIx} implementation does not support event notification.
\end{constantdesc}
%%%%
\descr
Deregister an event handler. Note that no events corresponding to the referenced registration may be delivered following completion of the deregistration operation (either return from the \ac{API} with \refconst{PMIX_OPERATION_SUCCEEDED} or execution of the \refarg{cbfunc}).
%%%%%%%%%%%
\subsection{\code{PMIx_Notify_event}}
\declareapi{PMIx_Notify_event}
%%%%
\summary
Report an event for notification via any
registered event handler.
%%%%
\format
\versionMarker{2.0}
\cspecificstart
\begin{codepar}
pmix_status_t
PMIx_Notify_event(pmix_status_t status,
const pmix_proc_t *source,
pmix_data_range_t range,
pmix_info_t info[], size_t ninfo,
pmix_op_cbfunc_t cbfunc, void *cbdata);
\end{codepar}
\cspecificend
\begin{arglist}
\argin{status}{Status code of the event (\refstruct{pmix_status_t})}
\argin{source}{Pointer to a \refstruct{pmix_proc_t} identifying the original reporter of the event (handle)}
\argin{range}{Range across which this notification shall be delivered (\refstruct{pmix_data_range_t})}
\argin{info}{Array of \refstruct{pmix_info_t} structures containing any further info provided by the originator of the event (array of handles)}
\argin{ninfo}{Number of elements in the \refarg{info} array (\code{size_t})}
\argin{cbfunc}{Callback function to be executed upon completion of operation \refapi{pmix_op_cbfunc_t} (function reference)}
\argin{cbdata}{Data to be passed to the cbfunc callback function (memory reference)}
\end{arglist}
If \refarg{cbfunc} is \code{NULL}, the function will be treated as a \emph{blocking} call and the result of the operation returned in the status code.
If \refarg{cbfunc} is non-\code{NULL}, the function will be treated as a \emph{non-blocking} call and return one of the following:
\begin{constantdesc}
\item \refconst{PMIX_SUCCESS} The notification request is valid and is being processed. The callback function will be called when the process-local operation is complete and will provide the resulting status of that operation. Note that this does \textit{not} reflect the success or failure of delivering the event to any recipients. The callback function must not be executed prior to returning from the \ac{API}.
\item \refconst{PMIX_OPERATION_SUCCEEDED}, indicating that the request was immediately processed and returned \textit{success} - the \refarg{cbfunc} will \textit{not} be called
\item \refconst{PMIX_ERR_BAD_PARAM} The request contains at least one incorrect entry that prevents it from being processed. The callback function will \textit{not} be called.
\item \refconst{PMIX_ERR_NOT_SUPPORTED} The \ac{PMIx} implementation does not support event notification, or in the case of a \ac{PMIx} server calling the API, the range extended beyond the local node and the host \ac{SMS} environment does not support event notification. The callback function will \textit{not} be called.
\end{constantdesc}
\reqattrstart
The following attributes are required to be supported by all \ac{PMIx} libraries:
\pastePRIAttributeItem{PMIX_EVENT_NON_DEFAULT}
\pastePRIAttributeItem{PMIX_EVENT_CUSTOM_RANGE}
\divider
Host environments that implement support for \ac{PMIx} event notification are required to provide the following attributes for all events generated by the environment:
\pastePRRTEAttributeItem{PMIX_EVENT_AFFECTED_PROC}
\pastePRRTEAttributeItem{PMIX_EVENT_AFFECTED_PROCS}
\reqattrend
%%%%
\descr
Report an event for notification via any registered event handler. This function can be called by any \ac{PMIx} process, including application processes, \ac{PMIx} servers, and \ac{SMS} elements. The \ac{PMIx} server calls this \ac{API} to report events it detected itself so that the host \ac{SMS} daemon distribute and handle them, and to pass events given to it by its host down to any attached client processes for processing. Examples might include notification of the failure of another process, detection of an impending node failure due to rising temperatures, or an intent to preempt the application. Events may be locally generated or come from anywhere in the system.
Host \ac{SMS} daemons call the API to pass events down to its embedded \ac{PMIx} server both for transmittal to local client processes and for the server's own internal processing.
Client application processes can call this function to notify the \ac{SMS} and/or other application processes of an event it encountered. Note that processes are not constrained to report status values defined in the official \ac{PMIx} standard --- any integer value can be used. Thus, applications are free to define their own internal events and use the notification system for their own internal purposes.
\adviceuserstart
The callback function will be called upon completion of the
\code{notify_event} function's actions. At that time, any messages required for executing the operation (e.g., to send the notification to the local \ac{PMIx} server) will
have been queued, but may not yet have been transmitted. The caller is required to maintain the input
data until the callback function has been executed --- the sole purpose of the callback function is to indicate when the input data is no longer required.
\adviceuserend
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%