Metrics endpoint starts timing out at intermittent intervals #1035

ayush-rathore-quartic · 2024-05-23T20:42:13Z

What did you do?

Running postgres exporter as a container in a kubernetes pod which also hosts the postgresql server at localhost:5432

What did you expect to see?

The /metrics endpoint should have returned the prometheus metrics at all times

What did you see instead? Under which circumstances?

While the /metrics endpoint worked well for about an hour, after some time the metrics server starts timing out i.e. there is no response at :9187/metrics (Postgres Exporter is running at port 9187 of the pod). There are no logs about the failure to serve these requests in the postgres exporter logs
This issue often gets fixed when the postgres server is restarted,but only for sometime.
I can connect to the postgres server through psql at the same time

More Information
The requests at :9187/ and :9187/probe are being served well. When i try probing my postgresql server through the below commands:

curl "<POD IP>:9187/probe?target=127.0.0.1:5432&sslmode=disable"
curl "<POD IP>:9187/probe?target=:5432&sslmode=disable"
curl "<POD IP>:9187/probe?target=/var/run/postgresql:5432&sslmode=disable"

Output

# HELP pg_exporter_last_scrape_duration_seconds Duration of the last scrape of metrics from PostgreSQL.
# TYPE pg_exporter_last_scrape_duration_seconds gauge
pg_exporter_last_scrape_duration_seconds{cluster_name="mydb",namespace="default"} 1.002118094
# HELP pg_exporter_last_scrape_error Whether the last scrape of metrics from PostgreSQL resulted in an error (1 for error, 0 for success).
# TYPE pg_exporter_last_scrape_error gauge
pg_exporter_last_scrape_error{cluster_name="mydb",namespace="default"} 1

.... 
....
....
pg_up{cluster_name="mydb",namespace="default"} 0

Logs emitted in postgres exporter EVERYTIME the above /probe requests are fired to check reach-ability to postgres server

ts=2024-05-23T20:30:23.947Z caller=probe.go:41 level=info msg="no auth_module specified, using default"
ts=2024-05-23T20:30:23.947Z caller=server.go:74 level=info msg="Established new database connection" fingerprint=localhost:5432
ts=2024-05-23T20:30:23.949Z caller=collector.go:194 level=error target=:5432 msg="collector failed" name=bgwriter duration_seconds=0.001488188 err="pq: SSL is not enabled on the server"
ts=2024-05-23T20:30:23.950Z caller=collector.go:194 level=error target=:5432 msg="collector failed" name=replication_slot duration_seconds=0.002488279 err="pq: SSL is not enabled on the server"
ts=2024-05-23T20:30:23.950Z caller=collector.go:194 level=error target=:5432 msg="collector failed" name=database duration_seconds=0.003197173 err="pq: SSL is not enabled on the server"
ts=2024-05-23T20:30:24.949Z caller=postgres_exporter.go:716 level=error err="Error opening connection to database (postgresql://:5432): pq: SSL is not enabled on the server"

Environment

Linux/Kubernetes

System information:

Linux 5.15.0-1054-azure x86_64
postgres_exporter version:

	postgres_exporter, version 0.12.1 (branch: HEAD, revision: 1c063b1b1913db029d449818e9cd1750c2282198)
        build user:       root@a5fc99238ef0
        build date:       20230613-16:18:22
        go version:       go1.20.5
        platform:         linux/amd64
        tags:             netgo static_build

postgres_exporter flags:

/usr/local/bin/postgres_exporter --log.level=info
    {
      "name": "DATA_SOURCE_NAME",
      "value": "postgresql://postgres@:5432/postgres?host=/var/run/postgresql&sslmode=disable"
    },
    {
      "name": "PG_EXPORTER_EXTEND_QUERY_PATH",
      "value": "/var/opt/postgres-exporter/queries.yaml"
    },
    {
      "name": "PG_EXPORTER_CONSTANT_LABELS",
      "value": "cluster_name=mydb, namespace=default"
    }

PostgreSQL version:

sh-4.4$ psql --version
psql (PostgreSQL) 16.2 (OnGres 16.2-build-6.31)

Logs:

+ . /templates/shell-utils
++ LOCK_DURATION=60
++ LOCK_SLEEP=5
++ QUEUE_NAME=create_event.pipe
+++ readlink /proc/412/exe
++ SHELL=/usr/bin/bash
++ command -v /usr/bin/bash
+++ basename /usr/bin/bash
++ '[' xbash = x ']'
+++ basename /usr/bin/bash
++ '[' xbash = xbusybox ']'
+++ echo ehxB
+++ grep -q x
+++ echo ' -x'
++ SHELL_XTRACE=' -x'
+ set +x
+ exec /usr/local/bin/postgres_exporter --log.level=info
ts=2024-05-23T08:42:45.986Z caller=main.go:86 level=error msg="Error loading config" err="Error opening config file \"postgres_exporter.yml\": open postgres_exporter.yml: no such file or directory"
ts=2024-05-23T08:42:46.046Z caller=proc.go:250 msg="Excluded databases" databases=[]
ts=2024-05-23T08:42:46.046Z caller=tls_config.go:232 level=info msg="Listening on" address=[::]:9187
ts=2024-05-23T08:42:46.047Z caller=tls_config.go:235 level=info msg="TLS is disabled." http2=false address=[::]:9187
ts=2024-05-23T08:43:07.472Z caller=server.go:74 level=info msg="Established new database connection" fingerprint=/var/run/postgresql:5432
ts=2024-05-23T08:43:07.478Z caller=postgres_exporter.go:647 level=info msg="Semantic version changed" server=/var/run/postgresql:5432 from=0.0.0 to=16.2.0
ts=2024-05-23T20:07:25.624Z caller=probe.go:41 level=info msg="no auth_module specified, using default"
ts=2024-05-23T20:07:25.624Z caller=server.go:74 level=info msg="Established new database connection" fingerprint=localhost:5432


<No logs are emitted even when the requests at /metrics are timing out>

The text was updated successfully, but these errors were encountered:

ayush-rathore-quartic · 2024-05-23T20:44:18Z

cc @sysadmind

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metrics endpoint starts timing out at intermittent intervals #1035

Metrics endpoint starts timing out at intermittent intervals #1035

ayush-rathore-quartic commented May 23, 2024 •

edited

Loading

ayush-rathore-quartic commented May 23, 2024

Metrics endpoint starts timing out at intermittent intervals #1035

Metrics endpoint starts timing out at intermittent intervals #1035

Comments

ayush-rathore-quartic commented May 23, 2024 • edited Loading

ayush-rathore-quartic commented May 23, 2024

ayush-rathore-quartic commented May 23, 2024 •

edited

Loading