Skip to content

Commit

Permalink
Sync with upstream
Browse files Browse the repository at this point in the history
  • Loading branch information
jpwhite4 committed Jul 7, 2023
1 parent aaf1562 commit 9044a8a
Show file tree
Hide file tree
Showing 2 changed files with 36 additions and 17 deletions.
49 changes: 34 additions & 15 deletions 10.0/supremm-compute-pcp.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@

This section gives example configuration settings for PCP running on the compute nodes
This section gives example configuration settings for Performance Co-Pilot (PCP) running on the compute nodes
of an HPC cluster. These configuration guidelines are based on the PCP data collection setup
at CCR buffalo, which uses PCP version 4.2 that is supplied with Centos 7.
at CCR buffalo, which uses PCP version 4.3.2 that is supplied with Centos 7.

## Prerequisites

Expand All @@ -17,8 +17,9 @@ data are collected:
* Periodically during the job (recommended 30 seconds)
* At the end of every job

The archive data for each node should be stored on a shared filesystem for
subsequent processing by the job summarization software.
The archive data for each node needs to be readable by the job summarization software
process. We recommend that the archive data is stored on a shared filesystem such as a parallel filesystem
or network attached storage.

In order to constrain the number of files in a given directory,
we recommend storing the archives in a directory structure
Expand All @@ -45,7 +46,7 @@ this directory structure is still supported by the indexing script and may
still be used. The reason for changing the recommendation is that the new directory
structure limits the total number of files under a given directory. This
helps reduce the runtime of backup software. If the filesystem I/O performance with
the existing directory stucture is not an issue then do not change to the new one.
the existing directory structure is not an issue then there is no need to change to the new one.

Configuration Templates
-----------------------
Expand All @@ -54,17 +55,23 @@ The [Job Summarization software][] includes template files that can be used to
configure PCP collection on the compute nodes. The package itself should not
be installed on the compute nodes, however you may wish to install it
on a test node in order to obtain the PCP configuration file templates.
Alternatively, the scripts may be extracted directly from the source tarball.
The template files should be edited before use.
Alternatively, the scripts may be extracted directly from the source tar file or
from the [github repository][].

The PCP configuration templates are applicable to PCP version 4.3.2 that is
supplied with Centos 7 and may need to be modified to work with more recent versions
of the PCP software.

The template files must be edited before use.

These templates are available:
------------------------------
#### /usr/share/supremm/templates/pmlogger/control
#### /usr/share/supremm/templates/pcp-4.3.x/pmlogger/control

This template file **must** be edited to specify the path to the directory
where the PCP archives are to be saved.

The path `PCP_LOG_DIR/pmlogger` should be changed to the path where the PCP archives are to be saved.
The path `PCP_LOG_DIR/pmlogger` should be changed to the path where the PCP archives are to be saved.

The edited template should be saved in the `/etc/pcp/pmlogger` directory and any existing files under
`/etc/pcp/pmlogger/control.d` must be removed to ensure that there is only one primary logger
Expand All @@ -73,15 +80,18 @@ configured.
Note that the string `LOCALHOSTNAME` in the file is expanded by the pcp logger software to the hostname
of the compute node running the logger.

It is also recommended to disable the compression behavior so that the
The template includes the following directive that disables the compression behavior so that the
archive files do not get compressed while the summarization software is
processing them:

```
$PCP_COMPRESSAFTER=never
```

#### /usr/share/supremm/templates/pmlogger/pmlogger-supremm.config
The template also specifies the `-T24h10m`` option to the pmlogger process to ensure new log files
are created every day.

#### /usr/share/supremm/templates/pcp-4.3.x/pmlogger/pmlogger-supremm.config
* Moved to /etc/pcp/pmlogger
* Can be updated to change metrics logged or frequency
* You may wish to reduce logging frequency from the default 30 seconds until confirming impact on your system and storage utilization
Expand All @@ -101,7 +111,7 @@ changed to match the directory where the PCP archives are to be saved.
This script will run data collection at job end. This script should be merged into your existing epilogue script.
This script is designed for and tested with the [Slurm][] resource manager.

#### /usr/share/supremm/templates/hotproc/hotproc.conf
#### /usr/share/supremm/templates/pcp-4.3.x/hotproc/hotproc.conf
* Moved to /var/lib/pcp/pmdas/proc

This configuration file sets the parameters for process logging into the pcp
Expand All @@ -121,14 +131,13 @@ filesystems over NFS then enable the nfsclient PMDA.

<!-- Empty Comment to fix broken markdown parsing -->

$ touch /var/lib/pcp/pmdas/slurm/.NeedInstall
$ touch /var/lib/pcp/pmdas/nvidia/.NeedInstall
$ touch /var/lib/pcp/pmdas/gpfs/.NeedInstall
$ touch /var/lib/pcp/pmdas/nfsclient/.NeedInstall
$ touch /var/lib/pcp/pmdas/perfevent/.NeedInstall
$ touch /var/lib/pcp/pmdas/mic/.NeedInstall
$ touch /var/lib/pcp/pmdas/libvirt/.NeedInstall

Configure Global Process Capture
--------------------------------

Expand All @@ -148,7 +157,7 @@ to save disk space. The summarization software can read both compressed
and uncompressed archives. However there is a potential race
condition with the archive compression running at the same time as the
job summarization software runs. At CCR we disable the compression with
the following directive set in the pmlogger control file `/etc/pmlogger/control.d/[FILENAME]`
the following directive set in the pmlogger control file `/etc/pmlogger/control`

```
$PCP_COMPRESSAFTER=never
Expand All @@ -163,5 +172,15 @@ service. On a systemd enabled system, this can be done with:

$ systemctl restart pmcd pmlogger

Archive management notes
------------------------

Once the job summarization software has generated a record for a job then the PCP archives
are no longer required and could be deleted to save disk space. Jobs are summarized after
the job accounting information has been ingested into XDMoD and this happens after
they complete. So the minimum amount of time to keep archives is the maximum permissable
HPC job wall time plus the latency between a job ending and it appearing in XDMoD.

[Job Summarization software]: supremm-processing-install.md
[Slurm]: https://www.schedmd.com/
[github repository]: https://github.com/ubccr/supremm/tree/master/config/templates
4 changes: 2 additions & 2 deletions 10.0/supremm-install-pcp.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ for other distributions (and earlier CentOS versions) are available from the
[official PCP dowload page](https://pcp.io/download.html),

The CentOS RPM packages of the summarization software are tested against the version of PCP
that is provided with CentOS (This is PCP version 4.3.2 as of CentOS 7.8).
that is provided with CentOS 7 (This is PCP version 4.3.2 as of CentOS 7.8).

For an RPM based install on CentOS 7 the following command will install PCP with
all of the associated PMDAS (monitoring plugins) that have been tested with the
Expand Down Expand Up @@ -49,6 +49,6 @@ Compatibility notes
-------------------

The summarization software is tested on CentOS 7 with the package versions of PCP that
are supplied with CentOS 7. In general any version of PCP will work as long as the
are supplied with CentOS 7 (PCP version 4.3.2). In general any version of PCP will work as long as the
summarization software is built against the same or newer version of PCP as the version
installed on the compute nodes.

0 comments on commit 9044a8a

Please sign in to comment.