-
Notifications
You must be signed in to change notification settings - Fork 397
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support opaque and acquire/release memory semantics #7517
base: master
Are you sure you want to change the base?
Conversation
OpenJ9 note: This requires a coordinated merge with eclipse-openj9/openj9#20475. |
e353977
to
fead69c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a review based on an initial pass through the changes. I'll come back for a more detailed review.
One high-level question I wanted to ask is whether these changes ought to be vetted at an OMR architecture meeting.
663a876
to
3a4d76a
Compare
Re @hzongaro's question
@0xdaryl @vijaysun-omr what do you think? |
Jenkins build all |
I will defer the question on whether we need architecture meeting review to Daryl, but I'll start running tests based on a review I just did. |
|
||
/** | ||
* Memory access ordering semantics flags | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like you are mixing up two types of ordering: 1) the order instructions are laid down in memory (more closely related to instruction-dispatching order. that might be what you meant by program order. relevant to compiler optimization.); 2) the order in which memory accesses are executed/observed (more closely related to instruction-issuing or cache RC-machine actioning. need of memory-barriers to enforce a certain order.).
by a quick glimpse of the code, i have a high-level comment for you to consider further: you seemed changing a lot of places querying 2) above into using querying 1) above. is that optimal or even possibly having correctness-implication?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opaque
semantics means that the execution order of an access to the symbol will match the program order as observed by the executing thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on weakly-consistent machine (e.g. POWER), you don't have guarantee of the execution order at all without memory barriers. i.e. program order is meaning-less unless we are talking about accesses to the same location (e.g. something can lead to paradoxical situations).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the Java VarHandle
description Opaque
semantics description:
Opaque operations are bitwise atomic and coherently ordered with respect to accesses to the same variable.
If I'm understanding those semantics correctly, for Opaque
we only need to ensure that accesses to the same variable/address are executed in the same order (as seen by the executing thread) with respect to each other as they are laid down in the program order. If I'm understanding the weakly-consistent machine memory model correctly, the address dependency between the accesses should be enough to ensure this order without memory barriers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making these changes and thinking through the effects on each architecture.
My main concern is potential confusion around the terms used for the various memory orderings. "Opaque", for example, isn't a well-known term and I think is only used in Java circles. So seeing it appear throughout the code does take some mental adjustment if someone isn't already familiar with the Java definitions. Nevertheless, I am supportive of providing refinement to the kinds of memory ordering the compiler has to deal with, and we have to call them something.
In many places where isOpaque()
appears, I think the semantics you're really trying to capture is !isTransparent()
, because you're relying on the opaqueness property to be also true for acquire/release and volatile memory orderings. If that's the case, then (to me at least), it might be more readable to use that instead in those places.
@0xdaryl perhaps to be more clear we could steal the naming scheme from LLVM's atomic ordering and rename |
Also, please see the CI failures. There are real build issues in some of them. |
It looks like LLVM aligns more with the C++ memory model and can accommodate Java semantics too (as well as other frontends). I don't think OMR has to pivot there just yet. Changing the memory model is something worthy of a longer, architectural discussion for sure. I'm happy for us to have that discussion, but I think what you have here is a fine bridge between what we have now (simple vs volatile) and those more granular orderings. I do like the |
Sorry for the really basic questions, but I'm still trying to understand the semantics in the various cases.
The description of memory_order_acq_rel seems to be specific to operations that perform both a read and write. I assume that a symbol that is marked with Also, the descriptions of memory ordering semantics for Java's
But if my understanding of the current implementation is correct, |
@hzongaro if those were basic questions this change would have been a lot easier to implement! My initial description (and likely my initial understanding) of the memory semantics may not have been 100% accurate. The important parts from the
I'll update the description of this PR to have a more thorough description of the different semantics. |
3a4d76a
to
edf4efe
Compare
This change expands the possible memory ordering semantics for a symbol from volatile and non-volatile to volatile, acquire/release, optimization opaque, and transparent. An enum and helper methods are added to facilitate working with memory ordering semantics. Signed-off-by: Spencer Comin <[email protected]>
edf4efe
to
0b6b3f1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the all updates that help to clarify the semantics!
I have some questions about whether some cases that are effectively testing for OpaqueSemantics
or stronger ought to be testing for AcquireReleaseSemantics
or stronger instead.
inline bool isTransparent(); | ||
|
||
inline void setOpaque(); | ||
inline bool isOpaque(); | ||
inline bool isAtLeastOrStrongerThanOpaque(); | ||
|
||
inline void setAcquireRelease(); | ||
inline bool isAcquireRelease(); | ||
inline bool isAtLeastOrStrongerThanAcquireRelease(); | ||
|
||
inline void setVolatile(); | ||
inline bool isVolatile(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it conceivable that there could be an ordering semantics weaker than TransparentSemantics
or one stronger than VolatileSemantics
? I'm wondering whether isAtLeastOrStrongerThanTransparent()
and isAtLeastOrStrongerThanVolatile()
methods would ever make sense. Alternatively, would it make sense to have a single isAtLeastOrStrongerThan(MemoryOrdering)
method instead?
The answer can be maybe, but not likely enough to worry about it right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only guarantee that TransparentSemantics
gives is address dependency, i.e., that for a single threaded application a read from an address a
will read the value written by the most recent (according to program order) write to a
. It is conceivable that there could be an ordering semantics that doesn't guarantee address dependency, but no CPU architecture supported by OMR has semantics that weak. If by some odd coincidence we ever support the DEC Alpha or some new architecture comes up that refuses to learn from history, we may have to revisit this.
Looking at what instructions we generate for volatile stores[1] and loads[2], it appears that we generate fully sequentially consistent stores (memory barrier before and after), but only acquire loads (memory barrier only after). Conceivably we could have a stronger ordering semantic that has fully sequentially consistent loads (see godbolt example [3]). Whether supporting full sequential consistency is something we want to do might be worth discussion. I'll look into this some more then update the documentation here for VolatileSemantics
if it really is observably different from full sequential consistency.
[1]
omr/compiler/aarch64/codegen/OMRTreeEvaluator.cpp
Lines 6191 to 6291 in 0e72488
TR::Register *commonStoreEvaluator(TR::Node *node, TR::InstOpCode::Mnemonic op, int32_t size, TR::CodeGenerator *cg) | |
{ | |
TR::MemoryReference *tempMR = TR::MemoryReference::createWithRootLoadOrStore(cg, node); | |
tempMR->validateImmediateOffsetAlignment(node, size, cg); | |
bool needSync = (node->getSymbolReference()->getSymbol()->isSyncVolatile() && cg->comp()->target().isSMP()); | |
bool lazyVolatile = false; | |
if (node->getSymbolReference()->getSymbol()->isShadow() && | |
node->getSymbolReference()->getSymbol()->isOrdered() && cg->comp()->target().isSMP()) | |
{ | |
needSync = true; | |
lazyVolatile = true; | |
} | |
TR::Node *valueChild; | |
if (node->getOpCode().isIndirect()) | |
{ | |
valueChild = node->getSecondChild(); | |
} | |
else | |
{ | |
valueChild = node->getFirstChild(); | |
} | |
if (needSync) | |
{ | |
generateSynchronizationInstruction(cg, TR::InstOpCode::dmb, node, TR::InstOpCode::ishst); | |
} | |
TR::Node *valueChildRoot = NULL; | |
/* | |
* Pattern matching compressed refs sequence of address constant NULL | |
+ | |
* treetop | |
* istorei | |
* aload | |
* l2i (X==0 ) | |
* lushr (compressionSequence ) | |
* a2l | |
* aconst NULL (X==0 sharedMemory ) | |
* iconst 3 | |
*/ | |
if (cg->comp()->useCompressedPointers() && | |
(node->getSymbolReference()->getSymbol()->getDataType() == TR::Address) && | |
(valueChild->getDataType() != TR::Address) && | |
(valueChild->getOpCodeValue() == TR::l2i) && | |
(valueChild->isZero())) | |
{ | |
TR::Node *tmpNode = valueChild; | |
while (tmpNode->getNumChildren() && tmpNode->getOpCodeValue() != TR::a2l) | |
tmpNode = tmpNode->getFirstChild(); | |
if (tmpNode->getNumChildren()) | |
tmpNode = tmpNode->getFirstChild(); | |
if (tmpNode->getDataType().isAddress() && tmpNode->isConstZeroValue() && (tmpNode->getRegister() == NULL)) | |
{ | |
valueChildRoot = valueChild; | |
} | |
} | |
/* | |
* Use xzr as source register of str instruction | |
* if valueChild is a compressed refs sequence of address constant NULL, | |
* or valueChild is a zero constant integer. | |
*/ | |
if ((valueChildRoot != NULL) || (valueChild->getDataType().isIntegral() && valueChild->isConstZeroValue() && (valueChild->getRegister() == NULL))) | |
{ | |
TR::Register *zeroReg = cg->allocateRegister(); | |
generateMemSrc1Instruction(cg, op, node, tempMR, zeroReg); | |
TR::RegisterDependencyConditions *deps = new (cg->trHeapMemory()) TR::RegisterDependencyConditions(0, 1, cg->trMemory()); | |
deps->addPostCondition(zeroReg, TR::RealRegister::xzr); | |
generateLabelInstruction(cg, TR::InstOpCode::label, node, generateLabelSymbol(cg), deps); | |
cg->stopUsingRegister(zeroReg); | |
} | |
else | |
{ | |
generateMemSrc1Instruction(cg, op, node, tempMR, cg->evaluate(valueChild)); | |
} | |
if (needSync) | |
{ | |
// ordered and lazySet operations will not generate a post-write sync | |
if (!lazyVolatile) | |
{ | |
generateSynchronizationInstruction(cg, TR::InstOpCode::dmb, node, TR::InstOpCode::ish); | |
} | |
} | |
if (valueChildRoot != NULL) | |
{ | |
cg->recursivelyDecReferenceCount(valueChildRoot); | |
} | |
else | |
{ | |
cg->decReferenceCount(valueChild); | |
} | |
tempMR->decNodeReferenceCounts(cg); | |
return NULL; | |
} |
[2]
omr/compiler/aarch64/codegen/OMRTreeEvaluator.cpp
Lines 6070 to 6088 in 0e72488
TR::Register *commonLoadEvaluator(TR::Node *node, TR::InstOpCode::Mnemonic op, int32_t size, TR::Register *targetReg, TR::CodeGenerator *cg) | |
{ | |
bool needSync = (node->getSymbolReference()->getSymbol()->isSyncVolatile() && cg->comp()->target().isSMP()); | |
node->setRegister(targetReg); | |
TR::MemoryReference *tempMR = TR::MemoryReference::createWithRootLoadOrStore(cg, node); | |
tempMR->validateImmediateOffsetAlignment(node, size, cg); | |
generateTrg1MemInstruction(cg, op, node, targetReg, tempMR); | |
if (needSync) | |
{ | |
generateSynchronizationInstruction(cg, TR::InstOpCode::dmb, node, TR::InstOpCode::ishld); | |
} | |
tempMR->decNodeReferenceCounts(cg); | |
return targetReg; | |
} |
[3] https://godbolt.org/z/3PYTc9W3G
// Since non-volatiles are implemented as two separate loads, we must use a special sequence to perform the load in | ||
// a single instruction even when SMP is disabled. | ||
else if (node->getSymbol()->isSyncVolatile()) | ||
else if (node->getSymbol()->isAtLeastOrStrongerThanAcquireRelease()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this comment need to be adjusted?
@@ -315,7 +315,7 @@ OMR::SymbolReference::getUseDefAliasesBV(bool isDirectCall, bool includeGCSafePo | |||
// (this is the same as before), or if we are unresolved and condy | |||
// (this is the extra condition added), we would return conservative aliases. | |||
if ((self()->isUnresolved() && (_symbol->isConstantDynamic() || !_symbol->isConstObjectRef())) || | |||
_symbol->isVolatile() || self()->isLiteralPoolAddress() || | |||
!_symbol->isTransparent() || self()->isLiteralPoolAddress() || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering whether testing !_symbol->isTransparent()
is more conservative than necessary, though it's definitely safe. I'm just trying to understand whether OpaqueSemantics
really needs to be treated conservatively here, or if testing for _symbol->isAtLeastOrStrongerThanAcquireRelease()
would be sufficient here and elsewhere in this method.
@vijaysun-omr, @zl-wang, thoughts?
@@ -2662,7 +2662,7 @@ OMR::Node::mayModifyValue(TR::SymbolReference * symRef) | |||
TR::Symbol * symbol = symRef->getSymbol(); | |||
if (node->getOpCode().isCall() || | |||
node->getOpCodeValue() == TR::monexit || | |||
(node->getOpCode().hasSymbolReference() && node->getSymbol()->isVolatile()) || | |||
(node->getOpCode().hasSymbolReference() && !node->getSymbol()->isTransparent()) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, is testing !isTransparent()
more conservative than necessary? I'm wondering whether isAtLeastOrStrongerThanAcquireRelease()
would be sufficient here.
@@ -176,7 +176,7 @@ OMR::Node::mayUse() | |||
TR_UseDefAliasSetInterface | |||
OMR::Node::mayKill(bool gcSafe) | |||
{ | |||
if (self()->getOpCode().hasSymbolReference() && (self()->getOpCode().isLikeDef() || self()->mightHaveVolatileSymbolReference())) //we want the old behavior in these cases | |||
if (self()->getOpCode().hasSymbolReference() && (self()->getOpCode().isLikeDef() || self()->mightHaveNonTransparentSymbolReference())) //we want the old behavior in these cases |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I'm wondering whether OpaqueSemantics
really needs to be treated conservatively here, and if this could safely check for "at least or stronger than acquire/release". Does this need to worry only about what other threads might do? If so, I don't think the fact that opaque operations need to be performed atomically would matter.
@@ -306,7 +306,7 @@ static bool isSafeToReplaceNode(TR::Node *currentNode, TR::TreeTop *curTreeTop, | |||
* => xload/xloadi a.volatileField | |||
* ... | |||
*/ | |||
//if (mayBeVolatileReference && !canMoveIfVolatile) | |||
//if (mayBeNonTransparentReference && !canMoveIfVolatile) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments that appear just above here talk about restrictions on "swinging down" volatile. Those comments will need to be updated.
bool mayBeNonTransparentReference = currentNode->mightHaveNonTransparentSymbolReference(); | ||
// Do not swing down non-transparent nodes | ||
if (mayBeNonTransparentReference) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this needs to be conservative in the treatment of OpaqueSemantics
. I think it really only needs to worry about situations where there could be interactions with other threads, so AcquireReleaseSemantics
or stronger.
I suspect the same might hold true for many of the changes for other optimizations, so I think they need to be considered on a case-by-case basis. I won't go through and comment on each.
@hzongaro re: overly conservative treatment of Since being overly conservative is still correct (and equivalent to what we already have with volatile/plain), do you think it would be better to get these changes in as-is and then address relaxing the restrictions on individual optimizations in future PRs? |
Yes, that sounds reasonable. Getting initial support in for the different memory semantics will allow OMR and downstream projects to begin to take advantage of them, even if the treatment is relatively conservative today. |
With the expansion of possible memory ordering semantics from binary volatile or non-volatile to volatile, acquire/release, opaque, and transparent, all test whether a symbol is volatile need to be refined depending on the intention of the test, i.e. is it testing if the symbol is strictly volatile, simply opaque, or somewhere in between? Signed-off-by: Spencer Comin <[email protected]>
This change adds arrays for opaque and acquire/release unsafe symrefs to the symbol reference table. Instead of having four separate fields, the fields are combined into an array that can be indexed by the OMR::Symbol::AccessMode enum. Signed-off-by: Spencer Comin <[email protected]>
This flag is removed in OpenJ9. Signed-off-by: Spencer Comin <[email protected]>
0b6b3f1
to
e6464b4
Compare
Expand the possible memory ordering semantics of symbols from volatile or non-volatile to volatile, acquire/release, opaque, or transparent. The memory ordering semantics are defined as follows:
Transparent
Only guaranteed to be bitwise atomic for data 32 bits or smaller and addresses.
This is the same as non-volatile semantics prior to this change.
Opaque
Accesses to opaque symbols are bitwise atomic.
The execution order of all opaque accesses to any given address in a single thread is the same as the program order of accesses to that address.
Acquire/Release
Loads of acquire/release symbols are acquire loads; i.e., loads and stores after a given acquire load will not be reordered to before that load. This matches the semantics of C's
memory_order_acquire
.Stores to acquire/release symbols are release stores; i.e., loads and stores before a given release store wil not be reordered to after that store. This matches the semantics of C's
memory_order_release
.Acquire/release accesses have a release-acquire ordering.
Acquire/release symbols also have all the same guarantees that opaque symbols have.
Volatile
Volatile accesses have a sequentially-consistent ordering. This matches the semantics of C's
memory_order_seq_cst
Volatile symbols also have all the same guarantees that acquire/release symbols have.
This is the same as volatile semantics prior to this change
Additionally, see the notes on memory ordering semantics in the documentation for Java's VarHandle