Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Unknown error" when loading Query Analytics on PMM 2.43.2 due to dirty clickhouse DB migration #3288

Open
1 task done
jessebye opened this issue Nov 7, 2024 · 0 comments
Assignees
Labels
bug Bug report

Comments

@jessebye
Copy link

jessebye commented Nov 7, 2024

Description

We recently updated PMM to v2.43.2 and encountered an error in QAN "Unknown error". The network logs show 502 bad gateway responses. Looking at the qan-api2.log, we see the following error:

stdlog: qan-api2 v2.43.2.
time="2024-11-07T18:08:43.720+00:00" level=info msg="Log level: info."
time="2024-11-07T18:08:43.720+00:00" level=info msg="DSN: clickhouse://127.0.0.1:9000?database=pmm&block_size=10000&pool_size=2" component=main
stdlog: dsn:  clickhouse://127.0.0.1:9000?database=pmm&block_size=10000&pool_size=2
stdlog: Migrations: Dirty database version 17. Fix and force version.

Going back in time a ways, it looks like the migration attempted to run after the upgrade but the DB was not healthy, so it got a connection refused. When the DB became healthy again, the migration began failing.

time="2024-11-04T17:32:00.538+00:00" level=info msg="Saved 2480 buckets in 533.156749ms." component=data_ingestion
stdlog: Drop 20241004 partitions of metrics. Result: {0xc000138000 0x1408a40}, Error: <nil>
stdlog: Got SIGTERM, shutting down...
time="2024-11-04T17:32:54.288+00:00" level=info msg="Server stopped." component=JSON
time="2024-11-04T17:32:54.288+00:00" level=info msg="Server stopped." component=debug
time="2024-11-04T17:32:54.288+00:00" level=warning msg="Closing requests channel." component=data_ingestion
time="2024-11-04T17:32:54.288+00:00" level=warning msg="Requests channel closed, nothing to store." component=data_ingestion
time="2024-11-04T17:32:54.288+00:00" level=warning msg="Requests channel closed, nothing to store." component=data_ingestion
time="2024-11-04T17:32:54.288+00:00" level=info msg=Done. component=main
stdlog: qan-api2 v2.43.2.
time="2024-11-04T17:33:53.109+00:00" level=info msg="Log level: info."
time="2024-11-04T17:33:53.109+00:00" level=info msg="DSN: clickhouse://127.0.0.1:9000?database=pmm&block_size=10000&pool_size=2" component=main
stdlog: Connection: dial tcp 127.0.0.1:9000: connect: connection refused
stdlog: qan-api2 v2.43.2.
time="2024-11-04T17:33:54.188+00:00" level=info msg="Log level: info."
time="2024-11-04T17:33:54.188+00:00" level=info msg="DSN: clickhouse://127.0.0.1:9000?database=pmm&block_size=10000&pool_size=2" component=main
stdlog: dsn:  clickhouse://127.0.0.1:9000?database=pmm&block_size=10000&pool_size=2
stdlog: Migrations: driver: bad connection in line 0: INSERT INTO schema_migrations (version, dirty, sequence) VALUES (?, ?, ?)
stdlog: qan-api2 v2.43.2.
time="2024-11-04T17:33:57.730+00:00" level=info msg="Log level: info."
time="2024-11-04T17:33:57.730+00:00" level=info msg="DSN: clickhouse://127.0.0.1:9000?database=pmm&block_size=10000&pool_size=2" component=main
stdlog: Connection: dial tcp 127.0.0.1:9000: connect: connection refused
stdlog: qan-api2 v2.43.2.
time="2024-11-04T17:33:58.738+00:00" level=info msg="Log level: info."
time="2024-11-04T17:33:58.738+00:00" level=info msg="DSN: clickhouse://127.0.0.1:9000?database=pmm&block_size=10000&pool_size=2" component=main
stdlog: Connection: dial tcp 127.0.0.1:9000: connect: connection refused
stdlog: qan-api2 v2.43.2.
time="2024-11-04T17:34:00.746+00:00" level=info msg="Log level: info."
time="2024-11-04T17:34:00.746+00:00" level=info msg="DSN: clickhouse://127.0.0.1:9000?database=pmm&block_size=10000&pool_size=2" component=main
stdlog: Connection: dial tcp 127.0.0.1:9000: connect: connection refused
stdlog: qan-api2 v2.43.2.
time="2024-11-04T17:34:03.911+00:00" level=info msg="Log level: info."
time="2024-11-04T17:34:03.911+00:00" level=info msg="DSN: clickhouse://127.0.0.1:9000?database=pmm&block_size=10000&pool_size=2" component=main
stdlog: dsn:  clickhouse://127.0.0.1:9000?database=pmm&block_size=10000&pool_size=2
stdlog: Migrations: Dirty database version 17. Fix and force version.
stdlog: qan-api2 v2.43.2.

Expected Results

After updating the Helm chart version, we expected a smooth update and QAN to continue to work correctly.

Actual Results

QAN stopped working after updating the Helm chart version to latest.

Version

v2.43.2

Steps to reproduce

  1. Install PMM helm chart v1.3.6 and collect some metrics for a while
  2. Update to chart v1.3.20
  3. Navigate to QAN and observe error

Relevant logs

see snippets above (can provide additional logs if requested)

Code of Conduct

  • I agree to follow Percona Community Code of Conduct
@jessebye jessebye added the bug Bug report label Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug report
Projects
None yet
Development

No branches or pull requests

2 participants