index not updated correctly when removing nodes from placement #4196

BertHartm · 2023-03-06T18:23:27Z

When scaling down a cluster, we're noticing that some data becomes unavailable for query. It appears in the form of partial results when querying for older data (from before the scale down).

We're also noticing that database_tick_index_num_docs remains flat for each node through the scale down, and then jumps up once the node is restarted. The effect if summing across all nodes the cluster is that the metric drops (when the old node is removed), and recovers to prior level when the remaining nodes restart.

General Issues

What service is experiencing the issue? (M3Coordinator, M3DB, M3Aggregator, etc)

m3db

What is the configuration of the service? Please include any YAML files, as well as namespace / placement configuration (with any sensitive information anonymized if necessary).

can provide if required, but I think this might be general
RF=3

How are you using the service? For example, are you performing read/writes to the service via Prometheus, or are you using a custom script?

issue relates to reads happening via remote read

Is there a reliable way to reproduce the behavior? If so, please provide detailed instructions.

It appears to be consistent when removing nodes from placements. It's more obvious when the clusters are small as more of the index is affected.

The text was updated successfully, but these errors were encountered:

robskillington · 2023-03-17T23:49:06Z

Hey @BertHartm - there’s been some fixes and tests added to cover this category of bugs. As far as we know there are not outstanding bugs in this space, so perhaps I can take our tests and run it against the version you’re running.

Is this 1.3 or 1.5? The exact SHA would be helpful as we investigate this. Thanks for reporting!

robskillington · 2023-03-22T13:19:50Z

I believe 1.5 is the version in question, we’ll test whether this recent patch (post 1.5 release) fixes what you’ve observed:
#4193

BertHartm · 2023-03-22T13:24:37Z

sorry, yes, this is 1.5.0 as released. dbnode Sha is e7df2b9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

index not updated correctly when removing nodes from placement #4196

index not updated correctly when removing nodes from placement #4196

BertHartm commented Mar 6, 2023

robskillington commented Mar 17, 2023

robskillington commented Mar 22, 2023

BertHartm commented Mar 22, 2023

index not updated correctly when removing nodes from placement #4196

index not updated correctly when removing nodes from placement #4196

Comments

BertHartm commented Mar 6, 2023

General Issues

robskillington commented Mar 17, 2023

robskillington commented Mar 22, 2023

BertHartm commented Mar 22, 2023