-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes a bug that made concurrent access of a large nested IonStruct unsafe when only its parent had been made read-only. #722
Conversation
@RepeatedTest(100) | ||
public void readOnlyIonStructMultithreadedAccessSucceeds() { | ||
testReadOnlyIonStructMultithreadedAccess(false); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is repeated 100 times. Each of those 100 times, we repeat 100 times:
- call
makeReadOnly
on the struct - create 4 concurrent tasks which
- assert field
a
is not null when accessed, 100 times
Is this the simplest reproduction? Is @RepeatedTest
necessary to provoke failure here? I understand if we don't want to go to great lengths to orchestrate the precise threading state that provokes failure, but I also wonder whether the layers of repetition could be flattened, intuitively I suspect they're not all necessary.
This version of the test is obviously less interesting than the version that clones in \1. above, which also makes me wonder whether we need to have two tests in such cases. Do we need a control (non-cloned) subject in CloneTest
? If we do, why not handle both in the same test, do assertions side by side?
This test was added for consistency's sake and as expected does not fail when the fix is not applied, so I'll look to readOnlyClonedIonStructMultithreadedNestedAccessSucceeds
for attempted simplifications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The case where clone occurs is more interesting for this particular bug, but not necessarily for all possible bugs. That's why I wanted to test them separately. It would be possible to introduce a bug that would affect only the non-cloned case; if that happens, I think it would be nice to have a test that fails separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can get behind having a test that fails separately. In that case I'd rather see the behavior (clone()
, making the parent immutable) injected in some way than boolean flags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the tests to inject the behavior and to remove layers / quantity of repetition while still producing reliable pre-fix failures on my system.
testReadOnlyIonStructMultithreadedNestedAccess(false, true); | ||
} | ||
|
||
@RepeatedTest(100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replacing all instances of @RepeatedTest(100)
with @Test
on my laptop still causes reliable failure (25/25 test runs) of this test with the fix rolled back, and reduces runtime of this suite from >2s to <150ms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Further testing shows a ~95% failure rate here, 6 passes in 100 trials with the first failure in trial 44.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, this is not what I observed. I've been seeing 20-40 passes out of 100 pre-fix on my hardware. Adding the repeats is what guaranteed failure in my case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will see if I can provoke consistent failure on my hardware while removing one of the layers of repetition.
…nsafe when only its parent had been made read-only.
38c10f7
to
f5b4c9e
Compare
Description of changes:
Releases v1.10.3 - v1.10.5 changed
IonValue.clone()
andIonValue.makeReadOnly()
from recursive to iterative (see #557 and #549). One side effect of this change is that on clone, IonStructs no longer eagerly copied the cloned struct's (and all of its child structs') field maps, which are lazily created for structs with more than 5 fields as an optimization to enable faster field access.In #630 we identified and fixed a bug that affected concurrent access of cloned, read-only structs with more than 5 members. The struct's field map was being populated in one thread while being accessed in another, resulting in non-deterministic behavior. The fix in #630 was to force a struct's field map to be populated upon being marked read-only, making it impossible for it to be created subsequently during a period of thread contention.
However, this fix did not go far enough because it only populated the field map of the struct on which
makeReadOnly()
had been called directly, not any child structs. This meant that there was still the possibility of a race condition when accessing child structs of a parent that had been made read-only.The added
readOnlyClonedIonStructMultithreadedNestedAccessSucceeds
demonstrated this problem, consistently failing 60-80% of its trials before the fix. All other added tests succeed before and after the fix because they exercise cases where the struct is not cloned (meaning that its field map will be created as the struct is populated), and/or the nested value that is accessed is marked read-only directly (forcing its field map to be populated due to the fix in #630).The fix included in this PR forces field maps to be created (if applicable) for any struct marked read-only, and for all child structs regardless of depth. This is achieved by piggybacking on the iterative walk of the tree performed in
IonValueLite.clearSymbolIDsIterative
, which is already employed byIonValue.makeReadOnly
. As an added protective measure, we also add a check toIonStructLite.fieldMapIsActive
to skip creation of the field map if the struct has already been marked read-only. This change alone is enough to make the failing test pass, but is not a viable solution on its own for performance reasons: large nested child structs of cloned read-only structs would never have field maps created, so every field access would have to be performed sequentially.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.