Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Maintain incremental analyze result #83

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 43 additions & 0 deletions text/0083-incremental-analyze.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# RFC: Incremental analyze table


## Summary

This proposal introduce a method to store some result of analyze request from TiDB in TiKV Storage, so that we do not calculate the whole table when we run `analyze table` next time.

## Motivation

The initial motivation was to reduce the impact of `analyze table` on the cluster. And then I found that most of the calculations are not needed.

- In produce environment, most of regions will not be update in a long time.
- The analyze request will cost a lot of CPU resource but the result data of it is very small (In most cases, no more than 0.1% of the total data).
- We can easily know whether the data of a certain region has been changed.

So we can cache the data in TiKV, which is the result of analyze.

## Detailed design

```protobuf
message StatisticCache {
bytes data = 1;
uint64 applied_index = 2;
}

message RaftSnapshotData {
metapb.Region region = 1;
uint64 file_size = 2;
repeated KeyValue data = 3;
uint64 version = 4;
SnapshotMeta meta = 5;
StatisticCache statistic_cache = 6;
}
```

* When a TiKV receives an analyze request it will check the cache and applied index of this region. If the cache data exists and `applied_index` of the cache equals to the current `applied_index` of this region, it means that the region has never write any user data or split.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the cache data exists and applied_index of the cache equals to the current applied_index of this region, it means that the region has never write any user data or split.

So if any member changes, this cache will be invalied?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Member changes? Do you mean that TiKV add one peer for this region?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/member changes/member changed

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conf change would not make cache invalid.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try some trick method to avoid some conf change(include member changed or leader transfer) making the cache invalid.

* This data will be store by raft protocol. So it will also be send to a new peer when the member configure of raft members changes.

## Future Works

The current region-size of TiKV is usually set to '100MB'. If this size increases to be larger in the future, we may need to maintain the analyze results for each sub-range in a region.