-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathompt-tr.tex
executable file
·2557 lines (2073 loc) · 136 KB
/
ompt-tr.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\documentclass{article}
\usepackage{graphicx}
\DeclareGraphicsExtensions{.pdf}
\usepackage[final]{pdfpages}
\headheight 0in
\oddsidemargin 0in
\evensidemargin 0in
\topmargin -.25in
\textwidth 6.5in
\textheight 9in
\title{OMPT: An OpenMP\textsuperscript{\textregistered} Tools Application Programming Interface for Performance Analysis}
\author{Alexandre Eichenberger\thanks{IBM T.J. Watson Research Center},
John Mellor-Crummey\thanks{Rice University},
Martin Schulz\thanks{Lawrence Livermore National Laboratory},
\\~\\
Nawal Copty\thanks{Oracle},
Jim Cownie\thanks{Intel},
Tim Cramer\thanks{RWTH Aachen University},
Robert Dietrich\thanks{TU Dresden, ZIH},
Xu Liu\hbox to 0in{$^\dagger$\hss},
Eugene Loh\hbox to 0in{$^\S$\hss},
Daniel Lorenz\thanks{J\"{u}lich Supercomputer Center},
\\
and other members of the OpenMP Tools Working Group}
\date{Revised June 22, 2016}
\usepackage{comment}
\usepackage{needspace}
\usepackage[colorlinks=true,citecolor=blue]{hyper ref}
\usepackage{url}
\usepackage{xcolor}
\RequirePackage[normalem]{ulem}
\usepackage{listings}
\lstdefinelanguage{diff}{
% morecomment=[f][\color{red!80!black}]-, % deleted lines
% morecomment=[f][\color{green!60!black}]+, % added lines
% moredelim=[is][\color{red!80!black}]{-*}{*-},
% moredelim=[is][\color{green!60!black}]{+*}{*+},
morecomment=[f][\color{red}\sout]-, % deleted lines
morecomment=[f][\color{blue}\uwave]+, % added lines
moredelim=[is][\color{red}\sout]{-*-}{-*-},
moredelim=[is][\color{blue}\uwave]{+*+}{+*+},
}
\lstdefinestyle{cstyle}{
belowcaptionskip=.25\baselineskip,
language=diff,
showstringspaces=false,
basicstyle=\ttfamily,
keywordstyle=\bfseries\color{green!60!black},
commentstyle=\itshape\color{purple!40!black},
columns=fullflexible,
keepspaces=true,
float,
floatplacement=H,
belowskip=\smallskipamount,
aboveskip=\smallskipamount,
% belowskip=-\baselineskip,
}
\lstset{style=cstyle}
\newcommand{\descheader}[1]{{\needspace{3\baselineskip}\vspace{1em}\noindent \fbox{#1}}}
\begin{document}
\begin{comment}
\pagestyle{empty}
\includepdf[
pages={-},
pagecommand={},
]{OMPT_TR_header}
\setcounter{page}{1}
\pagestyle{plain}
\end{comment}
\maketitle
\section{Introduction}
Today, it is difficult to produce high quality tools that support
% debugging and/or
performance analysis of OpenMP programs without tightly integrating them with a specific OpenMP runtime implementation. To address this problem, this document defines OMPT---an application programming interface (API) for first-party performance tools.\footnote{A {\em first-party} tool runs within the address space of an application process. This differs from a {\em third-party} tool, e.g., a debugger, which runs as a separate process.}
Extending the OpenMP standard with this API will make it possible to construct powerful tools that will support any standard-compliant OpenMP implementation.
\subsection{OMPT}
The design of OMPT is based on experience with two prior efforts to define a standard OpenMP tools API: the POMP API~\cite{Mohr:EWOMP02} and the Sun/Oracle Collector API~\cite{SunCollector,Jost:2005:AND:1892830.1892858}.
The POMP API provides support for instrumentation-based measurement. A drawback of this approach is that its overhead can be significant because an operation, e.g., an iteration of an OpenMP worksharing loop, may take less time than tool callbacks monitoring its execution.
In contrast,
the Sun/Oracle Collector API was designed primarily to support performance measurement
using asynchronous sampling. This design enables the construction of tools that attribute costs without the overhead and intrusion of pervasive instrumentation. With the Collector API, tools
can use low-overhead asynchronous sampling of application call stacks to record compact call path profiles. However, the Collector API doesn't provide enough instrumentation hooks to provide full tool support for statically-linked executables.
OMPT builds upon ideas from both the POMP and Collector APIs. The core of OMPT is a minimal set of features to support tools that employ asynchronous sampling to measure application performance. In addition, OMPT defines interfaces to support {\em blame shifting}~\cite{Tallent:PPoPP09,Tallent:PPoPP10}---a technique that shifts attribution of costs from symptoms to causes.
Finally, OMPT defines callbacks suitable for instrumentation-based monitoring of runtime events.
OMPT can be implemented entirely by a compiler, entirely by an OpenMP runtime system, or with a hybrid strategy that employs a mixture of compiler and runtime support.
With the exception of one routine for tool control, all functions in the OMPT API are intended for use only by tools rather than by applications. All OMPT API functions have a C binding. A Fortran binding is provided only for the single application-facing tool control function described in Section~\ref{sec:app-facing}.
In some cases, the OMPT API may enable a tool to infer details and observe performance implications about the implementation chosen by an OpenMP compiler and runtime. An OpenMP implementation may differ from the abstract execution model described by the OpenMP standard. The ability of tools using OMPT to observe such differences does not affect the language implementation's ability to optimize using the ``as if'' rule described in the OpenMP standard.
\subsubsection{Design Objectives}
OMPT tries to satisfy several design objectives for a performance tool interface for OpenMP. These objectives are listed in decreasing order of importance.
\begin{itemize}
\item The API should allow tools to gather sufficient information about an OpenMP program execution to associate costs with both the program and the OpenMP runtime.
\begin{itemize}
\item The API should provide an interface sufficient to construct low-overhead performance tools based on asynchronous sampling.
\item The API should enable a profiler that uses call stack unwinding to identify which frames in its call stack are present on behalf of the OpenMP runtime.
\item The OpenMP runtime should associate the activity of a thread at any point in time with a {\em state}, e.g., idle, which will enable a performance tool to interpret program behavior.
\item Certain API routines must be defined as {\em async signal safe} so that they can be invoked in a profiler's signal handler as it processes interrupts generated by asynchronous sampling.
\end{itemize}
\item Incorporating support for the API in an OpenMP runtime should add negligible overhead to the runtime system if the interface is not in use by a tool.
\item The API should define interfaces suitable for constructing instrumentation-based performance tools.
\item Adding the API to an OpenMP runtime should not impose an unreasonable development burden on the runtime developer.
\item The API should not impose an unreasonable development burden on tool implementers.
\end{itemize}
To support the OMPT interface for tools, an OpenMP runtime must maintain information about the state of each OpenMP thread and provide a set of API calls that tools can use to interrogate the OpenMP runtime. Maintaining information about the state of each thread in the runtime system is not free and thus an OpenMP runtime need not maintain state information unless a tool has registered its interest in this information.
Without any explicit request to enable tool support, an OpenMP runtime need not maintain any state for the benefit of tools.
\subsubsection{Minimally Compliant Implementation}
OMPT has a small set of mandatory features that provide a common foundation for all performance tools. A runtime may also implement additional, optional, OMPT features used by some tools to gather extra information about a program execution.
The features required by a minimally compliant implementation are summarized below.
\begin{itemize}
\item Maintain a unique identifier per OpenMP thread, parallel region, task region, target region, and target operation. In addition, each thread maintains a wait identifier.
\item Maintain pointers into the stack for each OpenMP thread to distinguish frames for user procedures from frames for OpenMP runtime routines.
\item Maintain a state and a wait condition for each OpenMP thread. Mandatory states are idle, work serial, work parallel, and undefined.
\item Provide callbacks to tools when encountering the following events: thread begin/end, parallel region begin/end, task create, task schedule, implicit task begin/end, target region begin/end, target data operation, target submission, a user-level tool control call, and runtime shutdown.
\item Implement several async signal safe inquiry functions to retrieve information from the OpenMP runtime.
\item Have the OpenMP runtime initiate a callback to a tool initialization routine
as directed by the value of a new OpenMP environment variable (\verb|OMP_TOOL|) and provide a function to register tool callbacks with the runtime.
\end{itemize}
\subsection{Document Roadmap}
This document first outlines various aspects of the OMPT tools API.
Section~\ref{sec:states} describes the state information maintained by the OpenMP runtime on behalf of OMPT for use by tools.
Section~\ref{sec:events} describes the OMPT callbacks to notify a tool of various OpenMP runtime events during an execution.
Section~\ref{sec:data} describes the data structures used by the OMPT interface.
Section~\ref{sec:inquiry} describes the runtime system inquiry operations supported by OMPT for the benefit of tools.
Section~\ref{sec:target-device-records} describes an API for tracing activities on target devices.
Section~\ref{sec:enabling} describes the OMPT API operations for tool initialization.
Section~\ref{sec:app-facing} describes the tool control interface available to applications.
Appendix~\ref{appendix:ompt-types} provides a definition of the complete OMPT interface in C.
Appendix~\ref{app:frame} illustrates the information that OMPT maintains about call stacks and the use of OMPT API routines to inspect it; this support enables tools to associate code executed in OpenMP parallel regions with application-level calling contexts.
% Appendix~\ref{app:registration} outlines some considerations that impact the design of the interface for tool registration.
\section{Runtime States}
\label{sec:states}
To enable a tool to understand what an OpenMP thread is doing, when a tool registers itself with an OpenMP runtime, the runtime
will maintain state information for each OpenMP thread that can be queried by the tool.
The state maintained for each thread by the OpenMP runtime is an
approximation of the thread's instantaneous state.
OMPT uses the enumeration type \verb|omp_state_t| for states;
Appendix~\ref{appendix:ompt-types:states} defines this type.
When the state of a thread not associated with the OpenMP runtime is queried, the runtime returns
\verb|omp_state_undefined|.
\begin{comment}
For each OpenMP thread the runtime maintains not only a state but also an \verb|ompt_wait_id_t|
identifier. When a thread is waiting for a lock, critical region,
ordered, or atomic, and the thread is in a wait
state, then
the thread's \verb|wait_id| field identifies the lock, critical construct, ordered construct, atomic construct, or internal variable
upon which the
thread is waiting. The semantics of the values used for a \verb|wait_id| are implementation defined.
A thread's \verb|wait_id| is undefined if the thread
is not in a wait state.
\end{comment}
Some states must be supported by any compliant implementation, e.g., those indicating that a thread is executing parallel or serial work. In other cases, alternatives exist. For instance, one may use a single state to represent all waiting at barriers or use a pair of states to differentiate between waiting at implicit and explicit barriers.
For some states, OpenMP runtimes have flexibility about whether to report the state early or late.
For example, consider when a thread acquires a lock. One
compliant runtime may transition a thread's state to
\verb|omp_state_wait_lock| early before the thread attempts to acquire a
lock. Another compliant runtime may transition a thread's state to
\verb|omp_state_wait_lock| late, only if the thread begins to spin or
block to wait for an unavailable lock. A third compliant runtime
may transition a thread's state to \verb|omp_state_wait_lock| even later, e.g., only
after the thread waits for a significant amount of time.
State values 0 to 127 are reserved for current OMPT states and future extensions.
\descheader{Idle State}
\begin{description}
\item \verb|omp_state_idle|
The thread is idle, waiting for work.
\end{description}
\descheader{Work States}
\begin{description}
\item \verb|omp_state_work_serial|
The thread is executing code outside all parallel regions.
\item \verb|omp_state_work_parallel|
The thread is executing code within the scope of a parallel region construct.
\sloppy
\item \verb|omp_state_work_reduction|
The thread is combining partial reduction results from threads in its team. A compliant
runtime might never report a thread in this state; a thread
combining partial reduction results may report its state as
\verb|omp_state_work_parallel| or \verb|omp_state_overhead|.
\end{description}
\descheader{Overhead State}
\begin{description}
\item \verb|omp_state_overhead|
A thread may be reported as being in the overhead state at any point while executing within an OpenMP runtime, e.g., while
preparing a parallel region,
preparing a new explicit task,
preparing a worksharing region, or
preparing to execute iterations of a parallel loop.
It is compliant to report some or all OpenMP runtime overhead
as work.
\end{description}
\descheader{Barrier Wait States}
\begin{description}
\item \verb|omp_state_wait_barrier|
\sloppy
The thread is waiting at either an implicit or explicit barrier.
A compliant implementation may have a thread enter this state
early, when the thread encounters a barrier, or late, when the
thread begins to wait at the barrier. A compliant implementation may never report a thread in this state; instead, a thread might report its state as \verb|omp_state_wait_barrier_implicit| or \verb|omp_state_wait_barrier_explicit|, as appropriate.
\item \verb|omp_state_wait_barrier_implicit|
\sloppy
The thread is waiting at an implicit barrier in a parallel region.
A compliant implementation may have a thread enter this state
early, when the thread encounters a barrier, or late, when the
thread begins to wait at the barrier.
A compliant runtime implementation may report \verb|omp_state_wait_barrier| for implicit barriers.
\item \verb|omp_state_wait_barrier_explicit|
The thread is waiting at an explicit barrier in a parallel region.
A compliant implementation may have a thread enter this state
early, when the thread encounters a barrier, or late, when the
thread begins to wait at the barrier.
A compliant runtime implementation may report \verb|omp_state_wait_barrier| for explicit barriers.
\end{description}
\descheader{Task Wait States}
\begin{description}
\item \verb|omp_state_wait_taskwait|
The thread is waiting at a taskwait construct. A compliant
implementation may have a thread enter this state early, when the
thread encounters a taskwait construct, or late, when the thread
begins to wait for an uncompleted task.
\item \verb|omp_state_wait_taskgroup|
The thread is waiting at the end of a taskgroup construct. A compliant
implementation may have a thread enter this state early, when the
thread encounters the end of a taskgroup construct, or late, when the thread
begins to wait for an uncompleted task.
\end{description}
\descheader{Mutex Wait States}
OpenMP provides several mechanisms that enforce mutual exclusion: locks, critical, atomic, and ordered.
A runtime implementation may report a thread waiting for any type of mutual exclusion using either a state that precisely identifies the type of mutual exclusion, or a more generic state such as \verb|omp_state_wait_mutex| or \verb|omp_state_wait_lock|.
This flexibility may significantly simplify the maintenance of states associated with mutual exclusion in the runtime when various mechanisms for mutual exclusion rely on a common implementation, e.g., locks.
% Section~\ref{sec:wait-identifier} describes how each thread maintains a wait identifier to identify what a thread is awaiting. Before a thread enters any state indicating that it is awaiting mutual exclusion, the OpenMP runtime will update the thread's wait identifier to indicate what the thread is awaiting.
\begin{description}
\item \verb|omp_state_wait_mutex|
The thread is waiting for a mutex of an unspecified type. A compliant implementation
may have a thread enter this state early, when a thread encounters a lock acquisition or a region that requires mutual exclusion, or late, when the thread begins to wait.
\item \verb|omp_state_wait_lock|
The thread is waiting for a lock or nest lock. A compliant implementation
may have a thread enter this state early, when a thread
encounters a lock \verb|set| routine, or late, when the thread
begins to wait for a lock.
\item \verb|omp_state_wait_critical|
The thread is waiting to enter a critical region. A compliant
implementation may have a thread enter this state early, when the
thread encounters a critical construct, or late, when the thread
begins to wait to enter the critical region.
\item \verb|omp_state_wait_atomic|
The thread is waiting to enter an atomic region. A compliant
implementation may have a thread enter this state early, when the thread
encounters an atomic construct, or late, when the thread begins
to wait to enter the atomic region.
A compliant implementation may opt not to report
this state, for example, when using atomic hardware instructions that support non-blocking atomic implementations.
\item \verb|omp_state_wait_ordered|
The thread is waiting to enter an ordered region. A compliant
implementation may have a thread enter this state early, when the thread encounters
an ordered construct, or late, when the thread begins
to wait to enter the ordered region.
\end{description}
\descheader{Target Wait States}
\begin{description}
\item \verb|omp_state_wait_target|
The thread is waiting for a target region to complete.
\item \verb|omp_state_wait_target_data|
The thread is waiting for a target data mapping operation to complete.
A compliant runtime implementation may report \verb|omp_state_wait_target| for target data constructs.
\item \verb|omp_state_wait_target_update|
The thread is waiting for a target update operation to complete.
A compliant runtime implementation may report \verb|omp_state_wait_target| for target update constructs.
\end{description}
\descheader{Undefined}
\begin{description}
\item \verb|omp_state_undefined|
This state is reserved for threads that are not user threads,
initial threads, threads currently in an OpenMP team, or threads
waiting to become part of an OpenMP team.
\end{description}
\section{Events}
\label{sec:events}
This section describes callback events that an OpenMP runtime
may provide for use by a tool. OMPT uses the enumeration type \verb|ompt_event_t| for events;
Appendix~\ref{appendix:ompt-types:events} defines this type.
A tool need not register a callback for any particular event.
All callbacks are synchronous and will run to completion before another callback will occur on the same thread.
In most cases, an OpenMP runtime will not make any callback unless a tool has registered to receive it. The exception to this rule is begin/end event pairs.
To implement event notifications efficiently, for certain begin/end event pairs a runtime may assume that if one event of the pair has a callback registered, the other will have a callback registered as well. When this exception applies, it will be noted for affected events.
Callbacks for different events may have different type signatures.
The type signature for an event's callback is noted with the event definition. Appendix~\ref{appendix:ompt-types:callbacks} defines type signatures for callback events.
There are two classes of events: mandatory events and optional events.
Mandatory events must be implemented in any compliant OpenMP runtime implementation.
Optional events are grouped in sets of related events. Support for any particular optional event can be included or omitted at the
discretion of a runtime system implementer.
\subsection{Mandatory Events}
\label{sec:mandatory-events}
The following callback events must be supported by a compliant OpenMP
runtime system.
\descheader{Threads}
\begin{description}
\item \verb|ompt_event_thread_begin|
The OpenMP runtime invokes this callback in the context of an initial thread just after it initializes the runtime, or in the context of a new thread created by the runtime just after the thread initializes itself. In either case, this callback must be the first callback for a thread
and must occur before the thread executes any OpenMP tasks. This callback has type signature \verb|ompt_thread_begin_callback_t|.
The callback argument \verb|thread_type| indicates the type of the thread: initial, worker, or other.
\item \verb|ompt_event_thread_end|
The OpenMP runtime invokes this callback
after an OpenMP thread completes all of
its tasks but before the thread is destroyed. The callback
executes in the context of the OpenMP thread. This callback must be the last callback event for any worker thread; it is optional for other types of threads.
This callback has type signature \verb|ompt_thread_end_callback_t|.
\end{description}
\descheader{Parallel Regions}
\begin{description}
\item \verb|ompt_event_parallel_begin|
\sloppy
The OpenMP runtime invokes this callback
after a task encounters a parallel construct
but before any implicit task starts to execute the
parallel region's work. The callback executes in the context of the task that encountered the parallel construct.
This callback has type signature \verb|ompt_parallel_begin_callback_t|, and includes a parameter that indicates the number of threads requested by the user.
A tool may use this value as an upper bound on the number of threads that will participate in the team.
\item \verb|ompt_event_parallel_end|
The OpenMP runtime invokes this callback
after a parallel
region executes its closing synchronization barrier but before
resuming execution of the parent task. The callback executes in
the context of the task that encountered the parallel construct.
This callback has type signature \verb|ompt_parallel_end_callback_t|.
\end{description}
\noindent
{\em Note to implementers}: For a degenerate parallel region executed by a single thread, e.g.,
a nested region encountered when nested parallelism is disabled or at a nesting depth greater than the
maximum number of nested active parallel regions supported on a device,
it is implementation dependent whether or not an OpenMP runtime will perform
\verb|ompt_event_parallel_begin| and \verb|ompt_event_parallel_end| callbacks.
\descheader{Tasks}
\begin{description}
\item \verb|ompt_event_task_create|
The OpenMP runtime invokes this callback
upon encountering a task construct or a target construct that causes a task to be created.
The callback executes in the context of
the task that encountered the task or target construct.
This callback has type signature \verb|ompt_task_create_callback_t|.
The callback argument \verb|type| may indicate either an explicit task or one of the varieties of target tasks.
The callback argument \verb|has_dependences| is true if the task has dependences with respect to data objects.
\item \verb|ompt_event_task_schedule|
The OpenMP runtime invokes this callback after it
completes or suspends one task and before it schedules another task. This
callback executes in the context of the newly-scheduled task.
This callback has type signature \verb|ompt_task_schedule_callback_t|.
The callback argument \verb|prior_task_data| indicates the prior task.
The callback argument \verb|prior_completed| is set if the prior task completed.
The callback argument \verb|next_task_data| indicates the task being scheduled.
\item \verb|ompt_event_implicit_task|
The OpenMP runtime invokes this callback with \verb|endpoint=|\verb|ompt_scope_begin| after an
implicit task is fully initialized but before the task begins to work.
The OpenMP runtime invokes this callback with the \verb|endpoint=|\verb|ompt_scope_end| after the implicit
task executes its closing synchronization barrier but before
the task is destroyed.
This callback executes in the context of the implicit task.
This callback has type signature \verb|ompt_scoped_implicit_callback_t|.
\end{description}
\descheader{Target Regions}
\begin{description}
\item \verb|ompt_event_target|
The OpenMP runtime invokes this callback with argument \verb|endpoint=ompt_scope_begin| after a task encounters any target construct.
The OpenMP runtime invokes this callback with argument \verb|endpoint=ompt_scope_end| when the execution of this construct completes on the host.
This callback executes in the context of the task that encounters the target construct.
This callback has type signature \verb|ompt_scoped_target_callback_t|.
The callback argument \verb|kind| indicates the kind of target construct.
The callback argument \verb|task_data| indicates the encountering task.
The callback argument \verb|device_id| indicates the device associated with the target construct.
The callback argument \verb|target_id| uniquely identifies a target construct instance.
The \verb|codeptr_ra| callback argument contains the return address of the call to the OpenMP runtime routine, which relates the target construct to the user program.
\item \verb|ompt_event_target_data|
The OpenMP runtime invokes this callback prior to a transfer or delete operation and after an allocate operation.
This callback occurs only if will result in activity on the target device.
This callback has type signature \verb|ompt_target_data_callback_t|.
The callback argument \verb|optype| indicates whether the data operation is allocate, transfer to device, transfer from device, or delete.
The callback arguments \verb|host_addr| and \verb|device_addr| indicate the locations of the data on the host and device, respectively.
The callback argument \verb|size| indicates the number of data bytes.
The callback argument \verb|target_id| indicates the instance of the target construct associated with this operation.
The callback argument \verb|host_op_id| provides a unique host-side identifier that represents the activity on the device.
\item \verb|ompt_event_target_submit|
The OpenMP runtime invokes this callback prior to submitting a kernel for execution on a target device.
This callback has type signature \verb|ompt_target_submit_callback_t|.
The callback argument \verb|target_id| indicates the instance of the target construct associated with this operation.
The callback argument \verb|host_op_id| provides a unique host-side identifier that represents the activity on the device.
The callback arguments \verb|requested_num_teams| \verb|granted_num_teams| indicate, respectively, the number of teams requested by the user and granted by the runtime.
\end{description}
\descheader{Application Tool Control}
\begin{description}
\item \verb|ompt_event_control|
If the user program calls \verb|ompt_control|, the
OpenMP runtime invokes this callback.
The callback executes in the context that the call occurs in the user program.
This callback has type signature \verb|ompt_control_callback_t|.
Arguments passed to the callback are those passed by the user to \verb|ompt_control|.
\end{description}
\descheader{Termination}
\begin{description}
\item \verb|ompt_event_runtime_shutdown|
The OpenMP runtime invokes this callback before it shuts down the
runtime system. This callback enables a tool to clean up its
state and record or report information gathered. A runtime may later restart and reinitialize the tool by
calling the tool initializer
function (described in Section~\ref{sec:init}) again.
This callback has type signature \verb|ompt_callback_t|.
\end{description}
\subsection{Optional Events}
This section describes two sets of events.
Section~\ref{sec:blame} describes a set of events intended primarily for use by sampling-based performance tools. These events enable a sampling-based
performance tool to
employ a strategy known as {\em blame shifting} to attribute waiting to activities that cause other threads to wait
rather than to contexts in which waiting is observed.
Section~\ref{sec:trace-events} describes additional events
that, when used in conjunction with other events described in Section~\ref{sec:events}, enable a tool to receive notifications for all OpenMP runtime events.
Support for these events is optional. The OpenMP runtime remains compliant even if it supports none of the events in this section.
\subsubsection{Events for Blame Shifting (Optional)}
\label{sec:blame}
This section describes callback events designed for use by sampling-based performance tools
that employ {\em blame shifting} to transfer blame for waiting from contexts
where waiting is observed to activities responsible for the waiting.\footnote{The utility of blame shifting has previously been demonstrated for attributing the cost of waiting to steal work
in a work-stealing runtime~\cite{Tallent:PPoPP09} or waiting to acquire a lock~\cite{Tallent:PPoPP10}.}
The time a thread spends waiting for work can be blamed on active tasks on other threads that aren't shedding enough parallelism to keep all threads busy.
The time a task spends waiting for other tasks to arrive or complete in barrier, taskwait, or taskgroup regions can be blamed on tasks late to arrive or complete.
The time a task $t$ spends waiting for mutual exclusion can be blamed on any task holding the mutex while $t$ waits.
Since waiting indicates the absence of any activity, a thread will not receive any event notification between the begin and end notifications for waiting.
\begin{description}
\item \verb|ompt_event_idle|
\sloppy
The OpenMP runtime invokes this callback with \verb|endpoint=|\verb|ompt_scope_begin| when a thread waits for work outside a parallel region.
The OpenMP runtime invokes this callback with \verb|endpoint=|\verb|ompt_scope_end| before the thread begins to execute an implicit task for
a parallel region or terminates. The callback executes in the environment of the waiting thread.
This callback has type signature \verb|ompt_idle_callback_t|.
\end{description}
\begin{description}
\item \verb|ompt_event_sync_region_wait|
The OpenMP runtime invokes this callback with \verb|endpoint=|\verb|ompt_scope_begin| when a task starts waiting in a barrier region, taskwait region, or taskgroup region.
The OpenMP runtime invokes this callback with the \verb|endpoint=|\verb|ompt_scope_end| when the task stops waiting in the region.
This callback has type signature \verb|ompt_scoped_sync_region_callback_t|.
The argument \verb|kind| indicates the kind of region causing the wait.
One region may generate multiple pairs of begin/end callbacks if another task is scheduled on the thread while the task awaiting completion of the region is stalled.
The callback argument \verb|codeptr_ra| may be NULL.
This callback executes in the context of the task that encountered the barrier, taskwait, or taskgroup construct.
\end{description}
\begin{description}
\item \verb|ompt_event_mutex_release|
The OpenMP runtime invokes this callback after a task releases a lock, performs the outermost release of a nest lock, or exits a critical, ordered,
or atomic region.
This callback has type signature \verb|ompt_mutex_callback_t|.
The argument \verb|kind| indicates the kind of release. In some runtime implementations, it may be inconvenient to distinguish the kind of mutex (lock, nest lock,
critical region, or atomic region) being released. If so, the runtime may simply report \verb|kind=|\verb|ompt_mutex|. If there is a matching
\verb|ompt_event_mutex_acquire| callback, it should report the same \verb|kind| value.
The \verb|wait_id| parameter identifies the lock or synchronization variable
associated with critical region, atomic region, or ordered section released. This callback executes in the context of the task that performed the release.
If an atomic region is implemented using a hardware instruction, then an OpenMP runtime may choose never to report a release for the atomic region.
However, if an atomic region is implemented using any mechanism that involves a software protocol that spin waits for a lock or retries hardware primitives
that can fail, then an OpenMP
runtime developer should consider reporting this event so that a task can accept blame for any spin waiting or retries that occurs while the task has
exclusive access to the atomic region.
Examples of hardware primitives that could fail and require explicit retries include transactions,
load-linked/store-conditional, or compare-and-swap.
\end{description}
\subsubsection{Events for Instrumentation-based Measurement Tools (Optional)}
\label{sec:trace-events}
The following events designed for instrumentation-based tools enable tools to receive notification for additional OpenMP runtime events of interest.
\descheader{Tasking}
\begin{description}
\sloppy
\item \verb|ompt_event_task_dependences|
If a task has any dependences with respect to data objects that constrain its ordering with respect to other tasks,
the OpenMP runtime invokes this callback immediately after the callback announcing the task's creation to announce its dependences with respect to data objects.
This callback has type signature \verb|ompt_task_dependences_callback_t|.
\item \verb|ompt_event_task_dependence_pair|
The OpenMP runtime invokes this callback to report a dependence between a producer (\verb|src_task_data|)
and a consumer (\verb|sink_task_data|) that blocked execution of the consumer.
This callback will occur before the consumer knows that the dependence is satisfied. This may happen early or late.
Note: this callback is used only to report blocking dependences between sibling tasks whose lifetimes overlap.
No callback will occur if a producer task finishes before a consumer task is created.
This callback has type signature \verb|ompt_task_dependence_callback_t|.
\end{description}
\descheader{Worksharing}
\begin{description}
\item \verb|ompt_event_worksharing|
\sloppy
The OpenMP runtime invokes this callback with \verb|endpoint=ompt_scope_begin| after a task encounters a worksharing
construct but before the task executes its first unit of work for the worksharing region.
The OpenMP runtime invokes this callback with \verb|endpoint=ompt_scope_end| after a task executes
its last unit of work for a worksharing construct and before the task executes the barrier for the construct (wait) or the statement following the construct (nowait).
This callback has type signature \verb|ompt_scoped_worksharing_callback_t|.
The \verb|wstype| callback argument indicates whether the worksharing construct is a loop, sections, single executor or other participant,
or workshare.
The \verb|codeptr_ra| callback argument contains the return address of the call to the OpenMP runtime routine, which relates the worksharing region to the user program,
may be NULL when \verb|endpoint=|\verb|ompt_scope_end|.
This callback executes in the context of the task that encountered the construct.
\end{description}
\descheader{Master Blocks}
\begin{description}
\item \verb|ompt_event_master|
The OpenMP runtime invokes this callback with \verb|endpoint=ompt_scope_begin| after the implicit task of a master thread encounters a master construct but
before the task
executes the master region.
The OpenMP runtime invokes this callback with \verb|endpoint=ompt_scope_end| after the implicit task of a master thread executed a master region
but before the task executes the statement
following the master construct.
This callback has type signature \verb|ompt_scoped_master_callback_t|.
This callback executes in the context of
the implicit task of a team's master thread.
\end{description}
\descheader{Target Data Mapping}
\begin{description}
\begin{comment}
\item \verb|ompt_event_target_data_begin|
The OpenMP runtime invokes this callback after a task encounters a target data construct but before the new data environment is created.
The callback executes in the context of the task that encountered the target data construct.
This callback has type signature \verb|ompt_target_data_callback_t|. Arguments to the callback include the encountering task, the
target device, and the return address of the call to the runtime routine performing the target data operation, which relates the operation to
the user program.
\item \verb|ompt_event_target_data_end|
The OpenMP runtime invokes this callback when the task that encountered the target data region is
done with the target data region.
This callback has type signature \verb|ompt_task_callback_t|.
The callback executes in the context of the task that encountered the target data construct.
\end{comment}
\item \verb|ompt_event_target_data_map|
The OpenMP runtime invokes this callback when a set of \verb|nitems| variables is mapped to or unmapped from the device data environment by a target, target data,
target enter data or target exit data construct.
This callback has type signature \verb|ompt_target_data_map_callback_t|.
The callback argument \verb|target_id| indicates the instance of the target construct associated with this operation.
The callback arguments \verb|host_addr|, \verb|device_addr|, \verb|bytes|, and \verb|mapping_flags| are arrays that describe data items mapped or unmapped.
Elements of the \verb|mapping_flags| array are bit vectors whose bits correspond to items in the enum \verb|ompt_target_map_flag_t|.
The callback executes in the context of the encountering task.
\end{description}
\descheader{Barrier, Taskwait, and Taskgroup}
\begin{description}
\item \verb|ompt_event_sync_region|
\sloppy
The OpenMP runtime invokes this callback with \verb|endpoint=|\verb|ompt_scope_begin| before a task
begins execution of a barrier region, taskwait region, or taskgroup region.
The OpenMP runtime invokes this callback with \verb|endpoint=|\verb|ompt_scope_end| before the task exits the synchronization region.
This callback has type signature \verb|ompt_scoped_sync_region_callback_t|.
The argument \verb|kind| indicates the kind of synchronization region: barrier, taskwait, or taskgroup.
The \verb|codeptr_ra| callback argument, which represents the return address of a call to an OpenMP runtime routine implementing the synchronization region,
may be NULL when \verb|endpoint=|\verb|ompt_scope_end|.
This callback executes in the context of the task that encountered the synchronization construct.
\end{description}
\descheader{Lock Creation and Destruction}
\begin{description}
\item \verb|ompt_event_init_lock|
The OpenMP runtime invokes this callback just after a
task initializes a lock or nest lock. This callback executes in the
context of the task that called a lock initialization routine.
This callback has type signature \verb|ompt_lock_init_callback_t|.
The callback argument \verb|is_nest_lock| indicates the type of lock being initialized.
The callback argument \verb|wait_id| identifies the lock.
The \verb|hint| parameter is the lock hint value passed to a hinted lock initialization routine.
The \verb|kind| parameter is a small integer indicating the lock implementation chosen by the OpenMP runtime.
The mapping between values of \verb|kind| and the lock implementations they represent can be determined using
\verb|ompt_enumerate_mutex_kinds|.
\item \verb|ompt_event_destroy_lock|
The OpenMP runtime invokes this callback just before a
task destroys a lock or nest lock.
This callback has type signature \verb|ompt_lock_destroy_callback_t|.
The callback argument \verb|wait_id| identifies the lock.
This callback executes in the context of the task that called a lock destruction routine.
\end{description}
\descheader{Lock, Nest Lock, Critical Section, Atomic, and Ordered}
\begin{description}
\item \verb|ompt_event_mutex_acquire|
\sloppy
The OpenMP runtime invokes this callback when a task invokes
\verb|omp_set_lock| to acquire a lock, invokes \verb|omp_set_nest_lock| to acquire a nest lock not already owned,
or encounters a critical, atomic, or ordered construct.
This callback has type signature \verb|ompt_mutex_acquire_callback_t|.
The callback argument \verb|kind| indicates the kind of mutex being acquired.
The callback argument \verb|hint| is the implementation hint value specified for a (nest) lock, critical, or atomic construct.
If no hint is available, e.g., for ordered constructs, \verb|hint=omp_hint_unknown|.
The callback argument \verb|impl| indicates the implementation choice associated with a lock, nest lock, critical, or atomic, or ordered.
The callback argument \verb|wait_id| identifies the (nest) lock, a critical construct's associated $name$ or synchronization variable, the program variable or synchronization variable associated with the atomic construct, or the synchronization variable associated with the ordered construct.
This callback executes in the context of the task that called \verb|omp_set_lock| or \verb|omp_set_nest_lock| or encountered the
critical, atomic, or ordered construct.
\item \verb|ompt_event_mutex_acquired|
The OpenMP runtime invokes this callback just after the task acquires a (nest) lock or enters a critical, atomic, or ordered region.
This callback has type signature \verb|ompt_mutex_callback_t|.
The callback argument \verb|kind| indicates the kind of mutex being acquired.
The callback argument \verb|wait_id| identifies the (nest) lock, a critical construct's associated $name$ or synchronization variable, or the program variable or synchronization variable associated with the atomic construct, or the synchronization variable associated with the ordered construct.
This callback executes in the context of the task that called \verb|omp_set_lock| or \verb|omp_set_nest_lock| or encountered the
critical or atomic construct.
\item \verb|ompt_event_nested_lock|
\sloppy
The OpenMP runtime invokes this callback with \verb|endpoint=|\verb|ompt_scope_begin| if a task begins to acquire a nest lock that is already owned by a task.
The OpenMP runtime invokes this callback with \verb|endpoint=|\verb|ompt_scope_end| just after a task completes the nested acquisition.
This callback has type signature \verb|ompt_scoped_nested_lock_callback_t|.
This callback executes in the context of the task that called an OMP API routine to set or unset a nest lock; its
callback argument \verb|wait_id| identifies the nest lock.
\end{description}
\descheader{Miscellaneous}
\begin{description}
\item \verb|ompt_event_flush|
\sloppy
The OpenMP runtime invokes this callback just after
performing a flush operation.
This callback has type signature \verb|ompt_flush_callback_t|.
This callback executes in the context of the task that encountered the flush construct.
\end{description}
\section{Tool Data Structures}
\label{sec:data}
Threads, parallel regions, task regions, target regions, and target operations are represented by unique identifiers of type \verb|ompt_data_t|.
This type allows tools to either attach tool-specific data to the aforementioned constructs or to maintain a tool-specifc integer ID.
Both options require the runtime to pass this identifiers by reference to the corresponding callbacks.
The initial value of an identifier is \verb|ompt_data_none|. This allows tools to detect if the identifier has already been initialzed.
Tools may assign the identifier a different value.
It is the tool's responsibility to maintain the resources it assigns to the identifiers.
If the runtime needs to report an invalid identifier, it passes a NULL pointer to the callback or returns a NULL pointer from an inquiry API function.
\subsection{Thread Identifier}
Each OpenMP thread has an associated identifier of type \verb|ompt_data_t|.
On thread creation, the runtime library initializes the thread identifier to \verb|ompt_data_none|.
Thread related event callbacks provide the identifier by reference in order to let a tool change its value.
A thread identifier can be retrieved on demand by invoking the \verb|ompt_get_thread_data|function (described in Section~\ref{sec:thread-inquiry}).
To indicate an invalid identifier, this function returns a NULL pointer.
\subsection{Parallel Region Identifiers}
Each OpenMP parallel region has an associated identifier of type \verb|ompt_data_t|.
At the begin of a parallel region, the runtime library initializes the parallel region identifier to \verb|ompt_data_none|.
Parallel region related event callbacks provide the identifier by reference in order to let a tool change its value.
A parallel region identifier can be retrieved on demand by invoking the \verb|ompt_get_parallel_info| function (described in Section~\ref{sec:parallel-inquiry}).
To indicate an invalid identifier, this function returns a NULL pointer.
\subsection{Task Region Identifiers}
Each OpenMP task has an associated identifier of type \verb|ompt_data_t|.
Task identifiers are assigned to initial, implicit, explicit, and target tasks.
On task region creation, the runtime library initializes the task region identifier to \verb|ompt_data_none|.
Task region related event callbacks provide the identifier by reference in order to let a tool change its value.
A task region identifier can be retrieved on demand by invoking the \verb|ompt_get_task_info| function (described in Section~\ref{sec:task-region}).
To indicate an invalid identifier, this function returns a NULL pointer.
\subsection{Target Region and Operation Identifiers}
Each OpenMP target region and target operation has an associated identifier of type \verb|ompt_id_t|.
A unique target identifier is assigned on the host each time an instance of a target construct is encountered.
Each operation within a target region, e.g., transferring data to/from a device or launching a kernel launch
on a device, is also assigned a unique target identifier.
Identifiers assigned to target regions or operations
are unique from the time an OpenMP runtime is initialized until it is shut down.
The current target region and operation identifiers can be retrieved by invoking the \verb|ompt_get_target_info| function (described in Section~\ref{sec:target-region}).
Tools should not assume that \verb|ompt_id_t| values are small or densely allocated.
The value \verb|ompt_id_none| is reserved to indicate an invalid target identifier.
The value \verb|ompt_id_none| will be returned for (a) the target region identifier if \verb|ompt_get_target_info| is invoked outside a target region and (b) the target operation identifier if \verb|ompt_get_target_info| is invoked while no target operation is in progress.
\subsection{Wait Identifiers}
Each thread instance maintains a {\em wait identifier} of type \verb|ompt_wait_id_t|.
When a task executing on a thread is waiting for something, the thread's wait identifier indicates what the thread is awaiting.
A wait identifier may represent a critical section {\em name}, a lock, a program variable accessed in an atomic region, or a synchronization object internal to an OpenMP runtime implementation.
\begin{comment}
\begin{quote}
\begin{verbatim}
typedef uint64_t ompt_wait_id_t;
\end{verbatim}
\end{quote}
\end{comment}
A thread's wait identifier can be retrieved on demand by invoking the \verb|ompt_get_state| function (described in Section~\ref{sec:thread-inquiry}).
Tools should not assume that \verb|ompt_wait_id_t| values are small or densely allocated.
When a thread is not in a wait state, a thread's wait identifier has an undefined value.
%%? Does that mean that the value is undefined and cannot sensibley be read, or that it has a specific value which we have
%%? defined somewhere, whose name is (something like) ompt_wait_id_undefined ?
%%johnmc says: a wait_id typically is set to the address of a lock on which you are spinning. If you aren't spinning on a lock, this value is undefined.
%% we could zero it out, but that would cost more.
\subsection{Structure to Support Classification of Stack Frames}
When executing an OpenMP program, at times procedure frames from the OpenMP runtime appear on the call stack between user code procedure frames.
To enable a tool to classify procedure frames on the call stack as belonging to the user program or the OpenMP runtime,
the runtime system maintains an instance of an \verb|ompt_frame_t| data structure
for each (possibly degenerate) task. A task is considered degenerate if a call to the OpenMP runtime to create a parallel
region or task does not create a new task. A degenerate task may arise when a parallel construct is encountered
in a parallel region and nested parallelism is not enabled or when an orphaned directive that would create a task is encountered outside a parallel region.
A degenerate task region may add runtime frames to the call stack before
invoking user code for the degenerate task and thus require an \verb|ompt_frame_t| data structure.
To simplify the discussion below, we omit the qualifier ``possibly degenerate'' each time we use the terms {\em task}.
Each initial, implicit, explicit, or target task maintains an \verb|ompt_frame_t| data structure
that contains a pair of pointers.
\vbox{
\begin{quote}
\begin{verbatim}
typedef struct ompt_frame_s {
void *exit_frame; /* runtime frame that calls user code */
void *enter_frame; /* user frame that calls the runtime */
} ompt_frame_t;
\end{verbatim}
\end{quote}
}
\noindent
An \verb|ompt_frame_t|'s lifetime begins when a task is
created and ends when the task is destroyed. Tools should not assume that a frame structure remains at a constant location in memory
throughout a task's lifetime.
Frame data is passed to some callbacks; it can also be retrieved
asynchronously
by invoking the \verb|ompt_get_task_info| function (described in Section~\ref{sec:task-region}) in a signal handler.
Frame data contains two components:
\begin{description}
\item \verb|exit_frame|
This value is set before the OpenMP runtime invokes a procedure containing user code.
%%? This description is misleading IMO. Surely this field contains the *value* of the user frame pointer.
%%? It doesn't point to the user frame pointer, but to the user frame itself.
%%? This language suggests another indirection which I don't believe is present.
%%johnmc says: what you described above is not how we implemented it. we set the exit_frame to the frame pointer of the runtime frame that is calling a
%% user procedure, not the user procedure's frame pointer.
This field points to the frame pointer of the runtime procedure frame that invoked the user code.
For compilers that generate code where the master thread for a parallel region invokes user code directly (e.g., older versions of GNU compilers),
this may point to a frame of user code for the enclosing task.
This value is NULL until just before the runtime invokes a procedure containing user code.
\item \verb|enter_frame|
This value is set each time the current task re-enters the
runtime to create a new implicit, explicit, or target task region. This field
%%? And here too
%%johnmc says: we set the enter_frame to the frame pointer of the user procedure frame that is calling a runtime system function, not the frame pointer of
%% of the runtime
points to the frame pointer for a user function that invokes the runtime to create a task region.
This value is set when a task enters the runtime and cleared before the runtime returns control to the task.
\end{description}
\begin{table}
\begin{center}
\begin{tabular}{|l|p{2in}|p{2in}|}
\hline
exit / enter & enter = null & enter = defined \\\hline\hline
exit = null & case 1) initial task in user code case 2) task that is created but not yet scheduled & task entered the runtime to schedule an implicit, explicit, or target task \\\hline
exit = defined & non-initial task in (or soon to be in) user code & non-initial task entered the runtime and scheduled an implicit, explicit, or target task\\\hline
\end{tabular}
\end{center}
\caption{Meaning of various values for {\tt exit\_frame} and {\tt enter\_frame}.}
\label{tab:frame}
\end{table}
\noindent
Table~\ref{tab:frame} describes the meaning of this structure with various values.
In the presence of nested parallelism, a tool may observe a sequence of \verb|ompt_frame_t| records for a thread. Appendix~\ref{app:frame} discusses an example that illustrates the use of \verb|ompt_frame_t| records with nested parallelism.
\paragraph{Advice to tool implementers:} A monitoring tool using
asynchronous sampling can observe values of
\verb|exit_frame| and \verb|enter_frame| at inconvenient times.
Tools must be prepared to observe and handle frame exit and reenter values that have not yet been set or reset as the program enters into, or returns from, the runtime.
\section{Inquiry Functions for Tools}
\label{sec:inquiry}
Inquiry functions retrieve data from the execution environment for
the tools.
All functions in the inquiry API are marked with \verb|OMPT_API|. These functions should not be global symbols in an OpenMP runtime implementation to avoid tempting tool developers to call them directly. Section~\ref{sec:init} describes how a tool should obtain pointers to these inquiry functions.
{\em All inquiry functions are async signal safe.}
Note that it is unsafe to call OpenMP Execution Environment Routines within an OMPT callback because doing so may cause deadlock.
Specifically, since OpenMP Execution Library Routines are not guaranteed to be async signal safe, they might acquire a lock that may already be held when an OMPT callback is involved.
\subsection{Enumerate States}
\label{ompt_enumerate_states}
The OpenMP runtime is allowed to support other states in addition to those described in this document.
For instance, a particular runtime system may want to
provide more detail about the nature of runtime overhead,
e.g., to differentiate between overhead associated with setting up a parallel region
and overhead associated with setting up a task. Further, a tool need not report all states defined herein, e.g., if state tracking for a particular state would be too expensive.
To enable a tool to identify all states that an OpenMP runtime implements, OMPT provides
the following interface for enumerating all states that may be reported by the runtime that is being used.
\begin{quote}
\begin{verbatim}
OMPT_API _Bool ompt_enumerate_states(
omp_state_t current_state,
omp_state_t *next_state,
const char **next_state_name
);
\end{verbatim}
\end{quote}
\noindent
To begin enumerating the states that a runtime system supports,