Skip to content

Commit

Permalink
Prepare 3.0.0 release, including documentation updates
Browse files Browse the repository at this point in the history
  • Loading branch information
paulgear authored Dec 27, 2023
2 parents 8bbd60c + c8e0902 commit d91e94e
Show file tree
Hide file tree
Showing 5 changed files with 77 additions and 16 deletions.
24 changes: 24 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,30 @@ Notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [3.0.0] - 2023-12-27

### Changed

- Added individual peer stats (derived from chronyd's measurements.log or ntpd's
peerstats) to metrics output. These are automatically detected in the default
locations of /var/log/chrony/measurements.log and /var/log/ntpstats/peerstats
and emitted automatically to telegraf or the prometheus exporter.
- Added --logfile command line option to allow changing the above defaults.
- Reformatted all code with 'black' (see Makefile 'format' target). Please use
this Makefile target before submitting any pull requests.
- Updated license to AGPLv3.
- Removed trace-related metrics.
- Removed juju layer.
- Clean up internal structure:
- outputs.py now provides a class structure for encapsulating knowledge about
collectd, prometheus, and telegraf
- alert.py and ntpmon.py no longer have special casing for the mode
- all argument processing and defaults selection is done in get_args()
- Some collectd data types have been updated to reflect the defaults available
in version 5.12. I'm considering deprecating collectd support in a near
future version. If you are using ntpmon with collectd and would prefer this
support not to go away, please let me know.

## [2.1.0] - 2023-12-22

### Changed
Expand Down
4 changes: 2 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ PREFIX=/usr/local
SHAREDIR=share/$(NAME)
SYSTEMD_SERVICE_DIR=/lib/systemd/system
USER=$(NAME)
VERSION=2.1.0
VERSION=3.0.0

TESTS=\
unit_tests/test_peer_stats.py \
Expand All @@ -24,7 +24,7 @@ pytest:
PYTHONPATH=$(PWD)/src python3 -m pytest $(TESTS)

datatest:
PYTHONPATH=./src ./testdata/testdata.sh
PYTHONPATH=$(PWD)/src ./testdata/testdata.sh

format:
black --line-length=128 --target-version=py39 src/ unit_tests/
Expand Down
57 changes: 47 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,11 @@ including support for Nagios performance data. NTPmon can also run as a daemon
for sending metrics to collectd, prometheus, or telegraf. It supports both
`ntpd` and `chronyd`.

NTPmon is designed to encourage the use of robust NTP configurations. The
defaults for what is considered healthy and non-healthy are roughly based on
[RFC8633: NTP Best Current
Practices](https://datatracker.ietf.org/doc/html/rfc8633).

## Copyright

Copyright (c) 2015-2023 Paul D. Gear <https://libertysys.com.au/>
Expand Down Expand Up @@ -71,6 +76,8 @@ To run NTPmon directly from source after manually installing the prerequisites:

NTPmon alerts on the following metrics of the local NTP server:

### Summary metrics

#### sync

Does NTP have a sync peer? If not, return CRITICAL, otherwise return OK.
Expand All @@ -79,8 +86,9 @@ Does NTP have a sync peer? If not, return CRITICAL, otherwise return OK.

Are there more than the minimum number of peers active? The NTP algorithms
require a minimum of 3 peers for accurate clock management; to allow for failure
or maintenance of one peer at all times, NTPmon returns OK for 4 or more
configured peers, CRITICAL for 1 or 0, and WARNING for 2-3.
or maintenance of one peer at all times, NTPmon returns OK for [4 or more
configured peers](https://datatracker.ietf.org/doc/html/rfc8633#section-3.2),
CRITICAL for 1 or 0, and WARNING for 2-3.

#### reach

Expand All @@ -98,7 +106,7 @@ less.
### System metrics

In addition, NTPmon retrieves the following metrics directly from the local NTP
server (using `ntpq -nc readvar`):
server (using `ntpq -nc readvar` or `chronyc -c tracking`):

- offset (as `sysoffset`, to distinguish it from `offset`)
- sys_jitter (as `sysjitter`, for grouping with `sysoffset`)
Expand All @@ -110,7 +118,34 @@ server (using `ntpq -nc readvar`):
See the [NTP documentation](http://doc.ntp.org/current-stable/ntpq.html#system)
for the meaning of these metrics.

### Prometheus exporter
### Peer metrics

Counts of each peer type are emitted under the `ntpmon_peers` metric. The
recognised peer types are `pps`, `sync`, `invalid`, `false`, `excess`, `backup`,
`outlier`, `survivor`, and `unknown`. (Under normal circumstances, `unknown`
will never appear - its presence indicates a bug in NTPmon.) Note that `sync`
also includes the `pps` peer (if any), and `survivor` also includes the `sync`
peer (if any), so they are not strictly mutually exclusive. There should be no
overlap in any of the other types.

If your `chronyd` or `ntpd` is configured to store peer (source) statistics,
these will be collected as they appear in the relevant log files
(`/var/log/chrony/measurements.log` and `/var/log/ntpstats/peerstats`,
respectively, by default) and emitted under the `ntpmon_peer` (singluar) metric,
in addition to all the above-mentioned metrics. Use the `--logfile` command
line option to monitor a different file if your distribution uses different
locations. NTPmon will silently ignore any issues relating to these files in
order to continue running, so if you don't notice metrics coming out when you
expect them, check permissions on the files and compare their contents to the
documented formats. Please submit a bug report if you encounter persistent
issues with this.

`Collectd` doesn't have a really great way to support these individual peer
metrics, so each peer is considered to be a `collectd` "host". This feature
should be considered experimental for `collectd`, and subject to change (input
on this is welcome).

## Prometheus exporter

When run in prometheus mode, NTPmon uses the [prometheus python
client](https://pypi.python.org/pypi/prometheus_client) to expose metrics via
Expand All @@ -119,23 +154,25 @@ been performed on this library by the NTPmon author; users are suggested not to
expose it on untrusted networks, and are reminded that - as stated in the
license terms - this software comes with no warranty.

### Telegraf integration
## Telegraf integration

When run in telegraf mode, NTPmon requires the telegraf [socket
listener](https://docs.influxdata.com/telegraf/v1/plugins/#input-socket_listener)
input plugin to be enabled. Use the `--connect` command-line option if you
configure this to listen on a host and/or port other than the default
(127.0.0.1:8094).

Telegraf is the preferred output integration for NTPmon (over collectd and
prometheus), due to its higher resolution timestamps, and measuring the
timestamp at the source which generated it rather than the scraping host. The
other integrations (first collectd, then Nagios, then prometheus) may eventually
go away if they are not widely used. Please let me know if you have strong
feelings about this.

## Startup delay

By default, until the NTP server has been running for 512 seconds (the minimum
time for 8 polls at 64-second intervals), `check_ntpmon` will return OK (zero
return code). This is to prevent false positives on startup or for short-lived
VMs. To ignore this safety precaution, use `--run-time` with a low number
(e.g. 1 sec).

## To do

- Better/more documentation.
- Better/more unit tests.
4 changes: 2 additions & 2 deletions debian/check_ntpmon-man.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
:Version: 2.1.0
:Date: 2023-09-28
:Version: 3.0.0
:Date: 2023-12-27
:Copyright: 2015-2023 Paul Gear
:Title: check_ntpmon
:Subtitle: NTPmon Nagios check
Expand Down
4 changes: 2 additions & 2 deletions debian/ntpmon-man.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
:Version: 2.1.0
:Date: 2023-09-28
:Version: 3.0.0
:Date: 2023-12-27
:Copyright: 2015-2023 Paul Gear
:Title: ntpmon
:Subtitle: NTP metrics monitor
Expand Down

0 comments on commit d91e94e

Please sign in to comment.