Dashboard for monitoring a node
There are 2 main files:
telegraf-data/telegraf.toml
is a configuration file for Telegraf. It uses the Tail plugin to:- read log files generated by sn_node
- extract essential information
- send it to an InfluxDB database
Node Monitor.json
is a json export of the dashboard from Grafana. It can be imported in both Grafana OSS and Grafana Cloud. Note that an InfluxDB data source named "InfluxDB Cloud" must be defined in Grafana before importing the json file.
A few environment variables must be defined before launching Telegraf agent:
- node_name: Name of the node displayed in the dashboard. You can monitor several nodes and in this case each node must have its own telegraf agent, each with a different node name in the configuration file. At runtime the choice list in the dashboard selects the node you want to monitor.
- logs_path: Path of log files to be watched by telegraf agent. Two types of files are observed in this directory:
sn_node.log
andsn_node.log.*
. - bucket: Mandatory value is "SN" (hardcoded in json file).
- url: URL of InfluxDB server.
- token: InfluxDB token. It must have write access to 'SN' bucket.
- organization: The InfluxDB organization. It is freely defined in InfluxDB OSS but is the user email address for InfluxDB Cloud.
Remarks for my personal tests in a local docker network:
There are 3 stacks each with one service:
- docker-compose-influxdb.yml: InfluxDB database
- docker-compose-telegraf.yml: Telegraf agent
- docker-compose-grafana.yml: Grafana server
Useful commands:
- To just relaunch telegraf agent:
docker stack rm telegraf && docker stack deploy -c docker-compose-telegraf.yml telegraf && sleep 2 && docker ps -a
and thendocker logs <container_id>
to observe telegraf agent own logs - To empty SN bucket:
./empty_bucket.sh
- Useful range dates for my static test cases for ../docker_tmp/logs/*: from 2022-03-20 13:36:00 to 2022-03-20 13:45:00