Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempting to delete tablets with massive channel history leads to oom #13695

Open
vporyadke opened this issue Jan 22, 2025 · 0 comments
Open
Assignees

Comments

@vporyadke
Copy link
Collaborator

vporyadke commented Jan 22, 2025

An attempt to delete a table on a development cluster led to the cluster being down - Hive is oom-killing any node it runs on.

The memory is in BS_QUEUE:

Image

The source of the problem appears to be this:
https://github.com/ydb-platform/ydb/blob/main/ydb/core/tablet/tablet_req_delete.cpp#L44 - TabletReqDelete actor creates a request for every entry in tablet history, and the tablets have >1000 entries each.

  1. We should limit the number of inflight requests in TabletReqDelete This should be fine, since it already does one message per group+channel pair. We should limit the number of inflight tablet deletions instead.
  2. This might not be enough, we might need to also do some other changes in Hive,
    for (const TTabletChannelInfo::THistoryEntry& historyInfo : channelInfo.History) {
    - e. g. could this go over the commit size limit?
@vporyadke vporyadke self-assigned this Jan 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant