Name		Name	Last commit message	Last commit date
parent directory ..
wrap		wrap
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
nvtx.w		nvtx.w

README.md

NVIDIA NVTX Wrappers for MPI

License: Copyright 2017 NVIDIA CORPORATION, released under 3-clause BSD license. This software also uses software that is released under a 3-clause BSD license by Lawrence Livermore National Laboratory.

Summary

The included sources can be used to generate wrappers for common Message Passing Interface (MPI) routines using the PMPI interface. The included sources will explicitly add a range using the NVIDIA Tools Extensions (NVTX) API. When an MPI program is instrumented with the NVIDIA profilers, a range will appear in the timeline for each traced MPI call.

You can read more about this technique here.

Prequisites

A working install of MPI
The NVIDIA CUDA Toolkit
Python
make

Building

Because each MPI implementation is subtly different, it is necessary to generate the wrappers for your installed MPI library. These will be generated from the file nvtx.w and the resulting file will be called nvtx_pmpi.c which will be built into a shared object to be used with your program. To build, simply run make in the top level directory.

$ make

Extending

If you would like to extend the library to include additional MPI calls of interest or change the way the data is represented, make your changes to nvtx.w and then rebuild. The makefile will automatically regenerate the wrapper source based on your changes. For more information about how to modify this file, please see wrap/README.md.

Usage

The shared object file built above must be preloaded, along with the the NVIDIA Tools Extensions library when gathering a performance profile. For example:

$ LD_PRELOAD="<path-to-library>/libnvtx_pmpi.so" nvprof -o timeline.prof ./a.out

If the program a.out uses any of the wrapped MPI calls then these function calls will appear as ranges in the NVPROF timline when it is later loaded into the NVIDIA Visual Profiler. Any data movement or kernels used by the MPI function call will appear in the range.

Known Limitations

Asynchronous MPI routines are not implemented because any data movement incurred as a result of these calls will not occur during the range.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nvtx_pmpi_wrappers

nvtx_pmpi_wrappers

README.md

NVIDIA NVTX Wrappers for MPI

Summary

Prequisites

Building

Extending

Usage

Known Limitations

Files

nvtx_pmpi_wrappers

Directory actions

More options

Directory actions

More options

Latest commit

History

nvtx_pmpi_wrappers

Folders and files

parent directory

README.md

NVIDIA NVTX Wrappers for MPI

Summary

Prequisites

Building

Extending

Usage

Known Limitations