Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guard notifying outdated scheduler #1559

Merged
merged 25 commits into from
Nov 9, 2023
Merged

Guard notifying outdated scheduler #1559

merged 25 commits into from
Nov 9, 2023

Conversation

rorbech
Copy link
Contributor

@rorbech rorbech commented Nov 2, 2023

TODOs:

Closes #1543
Closes #1558
Closes #1561

@rorbech rorbech changed the title Fix crashes when notifying outdated scheduler Guard notifying outdated scheduler Nov 2, 2023
Copy link
Contributor

@clementetb clementetb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great.

scope.launch {
try {
printlntid("on dispatcher")
realm_wrapper.realm_scheduler_perform_work(scheduler)
if (!cancelled.value) {
Copy link
Contributor

@clementetb clementetb Nov 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't core wait until the jobs have been executed to release the scheduler?

Also, this guard does not cover 100% of the cases, the pointer might have been released just after accessing cancelled.

Because closing and job execution happens on the same thread there is no way that the Realm gets closed while executing the Job.

@clementetb
Copy link
Contributor

clementetb commented Nov 2, 2023

Instead of guarding the SingleThreadDispatcherScheduler, we could cancel the coroutine scope just after closing the live Realm here.

Closing the coroutine there would cancel any pending core jobs that were posted.

@rorbech
Copy link
Contributor Author

rorbech commented Nov 3, 2023

Instead of guarding the SingleThreadDispatcherScheduler, we could cancel the coroutine scope just after closing the live Realm here.

Closing the coroutine there would cancel any pending core jobs that were posted.

If we have a user supplied scheduler then we cannot close it. There could also potentially be multiple scheduler in the same scope so feels better to use the exact signal from the user-data-free-callback to guard this.

@clementetb
Copy link
Contributor

Yes, we shouldn't be closing the main CoroutineContext after releasing the dispatcher.

The if-guard works, but depends on AtomicBoolean and requires to implement it on each platform. Coroutines have some structures to handle job cancelation gracefully.

For example instead of working with the main CoroutineContext we could have a specific CoroutineScope associated to the scheduler to track all the jobs that have been posted via the scheduler. Once the scheduler is deleted, we can cancel the scope, and thus all pending jobs.

@rorbech
Copy link
Contributor Author

rorbech commented Nov 3, 2023

But coroutine scopes are not preemptive so you are not guaranteed that any running coroutine will be immediately aborted. But you are right the current guard is actually not good enough ... though it looks like some of the schedulers in core uses a similar flag so might be that there are some other guarantees in effect here 🤔

Copy link
Collaborator

@nhachicha nhachicha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fix 👍 it looks like io.realm.kotlin.test.darwin.CoroutineTests.dispatchBetweenThreads is failing for macOS on CI

@rorbech rorbech changed the base branch from main to releases November 3, 2023 11:10
@rorbech
Copy link
Contributor Author

rorbech commented Nov 3, 2023

Nice fix 👍 it looks like io.realm.kotlin.test.darwin.CoroutineTests.dispatchBetweenThreads is failing for macOS on CI

The failing test is caused by trying to clean up the liveRealmContext in f3e4ce3#diff-291d3886fa0a21b98dc0ebd5ff0ce3846d78dab19a8d3ccffe6386899f0879a4. I have verified that it also happens on main if I try it out there so it has nothing to do with this PR, hence I therefore removed it from this PR to avoid stalling it.

Copy link
Contributor

@cmelchior cmelchior left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome 💯

@@ -753,9 +753,10 @@ class SyncedRealmTests {
.mutableRealmIntField
.increment(1)
}
realm.syncSession.uploadAllLocalChanges(10.seconds)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch 🙈, but any reason you are adding a timeout? The first upload doesn't have one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it because it would highlight what action is actually not executing as expected instead of just timeout out on the recipient side. I just added if for the uploads that I inserted, but just didn't walk over the rest of the code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok 👍

@rorbech rorbech merged commit 0dcdbfb into releases Nov 9, 2023
2 checks passed
@rorbech rorbech deleted the cr/fix-scheduler-crash branch November 9, 2023 09:20
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 14, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Crash when notifying on released scheduler
4 participants