Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why there's a concurrent access problem #87

Open
fatcat22 opened this issue Oct 22, 2024 · 2 comments
Open

Why there's a concurrent access problem #87

fatcat22 opened this issue Oct 22, 2024 · 2 comments

Comments

@fatcat22
Copy link

fatcat22 commented Oct 22, 2024

I'm investigating a blocking problem and gdb shows that the problem occurs in current_shard_blk_ids_.count of this commit, but what I'm not understand is why there's a concurrent access problem for current_shard_blk_ids_.count.

In my opinion, only one thread could execute into the code where the commit fixed, and when count is called, no more current_shard_blk_ids_.insert will be called. So why there's still a concurrent access problem?

Looking forward for reply. Thanks.

@dungeon-master-666
Copy link
Collaborator

dungeon-master-666 commented Oct 23, 2024

You're right about this:

only one thread could execute into the code where the commit fixed, and when count is called, no more current_shard_blk_ids_.insert will be called

The problem lies in a bit different scenario. If get_block_by_seqno fails for some block, td::MultiPromise calls the finishing lambda immediately without waiting for other tasks to finish. This cause IndexQuery to stop and destruct deallocating its variables, while some of shard blocks might be still waiting for read. And when they finish they access current_shard_blk_ids_ causing the crash.
The commit title is not very accurate, it should be "Fix crash caused by accessing deallocated variable".

@fatcat22
Copy link
Author

fatcat22 commented Oct 23, 2024

Thanks for your reply. Totally understand now. Both IndexQuery::finish and IndexQuery::error will call Actor::stop, which will destroy the IndexQuery object( unique_ptr.reset).

But here's another question: Is it rational to reset the unique_ptr directlly? In my opinion, Actor::stop should just decrease the reference count instead of deallocate the object. In our case, I believe the reference count of IndexQuery::actor_info_ptr_ won't be 0, but IndexQuery still be deallocate violently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants