Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support opaque and acquire/release memory semantics #7517

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

Spencer-Comin
Copy link
Contributor

@Spencer-Comin Spencer-Comin commented Oct 31, 2024

Expand the possible memory ordering semantics of symbols from volatile or non-volatile to volatile, acquire/release, opaque, or transparent. The memory ordering semantics are defined as follows:

Transparent

Only guaranteed to be bitwise atomic for data 32 bits or smaller and addresses.
This is the same as non-volatile semantics prior to this change.

Opaque

Accesses to opaque symbols are bitwise atomic.
The execution order of all opaque accesses to any given address in a single thread is the same as the program order of accesses to that address.

Acquire/Release

Loads of acquire/release symbols are acquire loads; i.e., loads and stores after a given acquire load will not be reordered to before that load. This matches the semantics of C's memory_order_acquire.
Stores to acquire/release symbols are release stores; i.e., loads and stores before a given release store wil not be reordered to after that store. This matches the semantics of C's memory_order_release.
Acquire/release accesses have a release-acquire ordering.
Acquire/release symbols also have all the same guarantees that opaque symbols have.

Volatile

Volatile accesses have a sequentially-consistent ordering. This matches the semantics of C's memory_order_seq_cst
Volatile symbols also have all the same guarantees that acquire/release symbols have.
This is the same as volatile semantics prior to this change

Additionally, see the notes on memory ordering semantics in the documentation for Java's VarHandle

@Spencer-Comin
Copy link
Contributor Author

OpenJ9 note: This requires a coordinated merge with eclipse-openj9/openj9#20475.

@Spencer-Comin Spencer-Comin force-pushed the ordered-opaque branch 4 times, most recently from e353977 to fead69c Compare November 7, 2024 15:24
@Spencer-Comin Spencer-Comin marked this pull request as ready for review November 7, 2024 15:42
@Spencer-Comin Spencer-Comin requested review from hzongaro and 0xdaryl and removed request for 0xdaryl November 7, 2024 15:44
@Spencer-Comin
Copy link
Contributor Author

Copy link
Contributor

@hzongaro hzongaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a review based on an initial pass through the changes. I'll come back for a more detailed review.

One high-level question I wanted to ask is whether these changes ought to be vetted at an OMR architecture meeting.

compiler/il/Aliases.cpp Outdated Show resolved Hide resolved
compiler/il/OMRSymbol.hpp Outdated Show resolved Hide resolved
compiler/il/OMRSymbol.hpp Outdated Show resolved Hide resolved
compiler/il/OMRSymbol.hpp Outdated Show resolved Hide resolved
compiler/aarch64/codegen/OMRTreeEvaluator.cpp Outdated Show resolved Hide resolved
compiler/arm/codegen/OMRTreeEvaluator.cpp Outdated Show resolved Hide resolved
compiler/arm/codegen/OMRTreeEvaluator.cpp Outdated Show resolved Hide resolved
compiler/il/Aliases.cpp Outdated Show resolved Hide resolved
@Spencer-Comin Spencer-Comin force-pushed the ordered-opaque branch 4 times, most recently from 663a876 to 3a4d76a Compare November 26, 2024 19:10
@Spencer-Comin
Copy link
Contributor Author

Re @hzongaro's question

One high-level question I wanted to ask is whether these changes ought to be vetted at an OMR architecture meeting.

@0xdaryl @vijaysun-omr what do you think?

@vijaysun-omr
Copy link
Contributor

Jenkins build all

@vijaysun-omr
Copy link
Contributor

I will defer the question on whether we need architecture meeting review to Daryl, but I'll start running tests based on a review I just did.

@0xdaryl 0xdaryl self-assigned this Dec 6, 2024

/**
* Memory access ordering semantics flags
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like you are mixing up two types of ordering: 1) the order instructions are laid down in memory (more closely related to instruction-dispatching order. that might be what you meant by program order. relevant to compiler optimization.); 2) the order in which memory accesses are executed/observed (more closely related to instruction-issuing or cache RC-machine actioning. need of memory-barriers to enforce a certain order.).

by a quick glimpse of the code, i have a high-level comment for you to consider further: you seemed changing a lot of places querying 2) above into using querying 1) above. is that optimal or even possibly having correctness-implication?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opaque semantics means that the execution order of an access to the symbol will match the program order as observed by the executing thread.

Copy link
Contributor

@zl-wang zl-wang Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on weakly-consistent machine (e.g. POWER), you don't have guarantee of the execution order at all without memory barriers. i.e. program order is meaning-less unless we are talking about accesses to the same location (e.g. something can lead to paradoxical situations).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the Java VarHandle description Opaque semantics description:

Opaque operations are bitwise atomic and coherently ordered with respect to accesses to the same variable.

If I'm understanding those semantics correctly, for Opaque we only need to ensure that accesses to the same variable/address are executed in the same order (as seen by the executing thread) with respect to each other as they are laid down in the program order. If I'm understanding the weakly-consistent machine memory model correctly, the address dependency between the accesses should be enough to ensure this order without memory barriers.

Copy link
Contributor

@0xdaryl 0xdaryl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making these changes and thinking through the effects on each architecture.

My main concern is potential confusion around the terms used for the various memory orderings. "Opaque", for example, isn't a well-known term and I think is only used in Java circles. So seeing it appear throughout the code does take some mental adjustment if someone isn't already familiar with the Java definitions. Nevertheless, I am supportive of providing refinement to the kinds of memory ordering the compiler has to deal with, and we have to call them something.

In many places where isOpaque() appears, I think the semantics you're really trying to capture is !isTransparent(), because you're relying on the opaqueness property to be also true for acquire/release and volatile memory orderings. If that's the case, then (to me at least), it might be more readable to use that instead in those places.

compiler/il/OMRSymbol.hpp Outdated Show resolved Hide resolved
compiler/il/OMRSymbol.hpp Outdated Show resolved Hide resolved
compiler/arm/codegen/FPTreeEvaluator.cpp Outdated Show resolved Hide resolved
@Spencer-Comin
Copy link
Contributor Author

@0xdaryl perhaps to be more clear we could steal the naming scheme from LLVM's atomic ordering and rename Volatile to SequentiallyConsistent, Opaque to Monotonic, and Transparent to NonAtomic and use isAtLeastOrStrongerThan* helpers

@0xdaryl
Copy link
Contributor

0xdaryl commented Dec 6, 2024

Also, please see the CI failures. There are real build issues in some of them.

@0xdaryl
Copy link
Contributor

0xdaryl commented Dec 9, 2024

It looks like LLVM aligns more with the C++ memory model and can accommodate Java semantics too (as well as other frontends). I don't think OMR has to pivot there just yet. Changing the memory model is something worthy of a longer, architectural discussion for sure. I'm happy for us to have that discussion, but I think what you have here is a fine bridge between what we have now (simple vs volatile) and those more granular orderings.

I do like the isAtLeastOrStrongerThan* helpers idea though as it can make the code clearer as well as the author's intents.

@hzongaro
Copy link
Contributor

hzongaro commented Dec 9, 2024

Sorry for the really basic questions, but I'm still trying to understand the semantics in the various cases.

Expand the possible memory ordering semantics of symbols from volatile or non-volatile to volatile, acquire/release, opaque, or transparent. The memory ordering semantics are defined as follows:

  • Volatile: same as volatile before this change.
  • Acquire/Release: as defined for C's memory_order_acq_rel
  • Opaque: accessed in program order, but without any assurance of memory ordering effects on other threads. This is similar to volatile in C or C++.
  • Transparent: same as non-volatile before this change.

Additionally, see the notes on memory ordering semantics in the documentation for Java's VarHandle

The description of memory_order_acq_rel seems to be specific to operations that perform both a read and write. I assume that a symbol that is marked with TR::Symbol::AcquireReleaseSemantics will be treated like memory_order_acquire in a read operation and like memory_order_release in a write operation - is that correct?

Also, the descriptions of memory ordering semantics for Java's VarHandle states, in part:

In addition to obeying Opaque properties, Acquire mode reads and their subsequent accesses are ordered after matching Release mode writes and their previous accesses.

But if my understanding of the current implementation is correct, TR::Symbol::AcquireReleaseSemantics doesn't seem to imply TR::Symbol::OpaqueSemantics. If so, what parts of the VarHandle documentation should I read and how do they map to the semantics in this implementation?

@Spencer-Comin
Copy link
Contributor Author

Sorry for the really basic questions

@hzongaro if those were basic questions this change would have been a lot easier to implement!

My initial description (and likely my initial understanding) of the memory semantics may not have been 100% accurate. The important parts from the VarHandle description are the descriptions of getVolatile, setVolatile, getOpaque, setOpaque, getAcquire, and setRelease, and the following paragraph:

Access modes control atomicity and consistency properties. Plain read (get) and write (set) accesses are guaranteed to be bitwise atomic only for references and for primitive values of at most 32 bits, and impose no observable ordering constraints with respect to threads other than the executing thread. Opaque operations are bitwise atomic and coherently ordered with respect to accesses to the same variable. In addition to obeying Opaque properties, Acquire mode reads and their subsequent accesses are ordered after matching Release mode writes and their previous accesses. In addition to obeying Acquire and Release properties, all Volatile operations are totally ordered with respect to each other.

I'll update the description of this PR to have a more thorough description of the different semantics.

This change expands the possible memory ordering semantics for a symbol from
volatile and non-volatile to volatile, acquire/release, optimization opaque,
and transparent. An enum and helper methods are added to facilitate working
with memory ordering semantics.

Signed-off-by: Spencer Comin <[email protected]>
@Spencer-Comin
Copy link
Contributor Author

@0xdaryl @hzongaro @zl-wang could I get another round of review on this?

Copy link
Contributor

@hzongaro hzongaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the all updates that help to clarify the semantics!

I have some questions about whether some cases that are effectively testing for OpaqueSemantics or stronger ought to be testing for AcquireReleaseSemantics or stronger instead.

Comment on lines +426 to +437
inline bool isTransparent();

inline void setOpaque();
inline bool isOpaque();
inline bool isAtLeastOrStrongerThanOpaque();

inline void setAcquireRelease();
inline bool isAcquireRelease();
inline bool isAtLeastOrStrongerThanAcquireRelease();

inline void setVolatile();
inline bool isVolatile();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it conceivable that there could be an ordering semantics weaker than TransparentSemantics or one stronger than VolatileSemantics? I'm wondering whether isAtLeastOrStrongerThanTransparent() and isAtLeastOrStrongerThanVolatile() methods would ever make sense. Alternatively, would it make sense to have a single isAtLeastOrStrongerThan(MemoryOrdering) method instead?

The answer can be maybe, but not likely enough to worry about it right now.

Copy link
Contributor Author

@Spencer-Comin Spencer-Comin Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only guarantee that TransparentSemantics gives is address dependency, i.e., that for a single threaded application a read from an address a will read the value written by the most recent (according to program order) write to a. It is conceivable that there could be an ordering semantics that doesn't guarantee address dependency, but no CPU architecture supported by OMR has semantics that weak. If by some odd coincidence we ever support the DEC Alpha or some new architecture comes up that refuses to learn from history, we may have to revisit this.

Looking at what instructions we generate for volatile stores[1] and loads[2], it appears that we generate fully sequentially consistent stores (memory barrier before and after), but only acquire loads (memory barrier only after). Conceivably we could have a stronger ordering semantic that has fully sequentially consistent loads (see godbolt example [3]). Whether supporting full sequential consistency is something we want to do might be worth discussion. I'll look into this some more then update the documentation here for VolatileSemantics if it really is observably different from full sequential consistency.

[1]

TR::Register *commonStoreEvaluator(TR::Node *node, TR::InstOpCode::Mnemonic op, int32_t size, TR::CodeGenerator *cg)
{
TR::MemoryReference *tempMR = TR::MemoryReference::createWithRootLoadOrStore(cg, node);
tempMR->validateImmediateOffsetAlignment(node, size, cg);
bool needSync = (node->getSymbolReference()->getSymbol()->isSyncVolatile() && cg->comp()->target().isSMP());
bool lazyVolatile = false;
if (node->getSymbolReference()->getSymbol()->isShadow() &&
node->getSymbolReference()->getSymbol()->isOrdered() && cg->comp()->target().isSMP())
{
needSync = true;
lazyVolatile = true;
}
TR::Node *valueChild;
if (node->getOpCode().isIndirect())
{
valueChild = node->getSecondChild();
}
else
{
valueChild = node->getFirstChild();
}
if (needSync)
{
generateSynchronizationInstruction(cg, TR::InstOpCode::dmb, node, TR::InstOpCode::ishst);
}
TR::Node *valueChildRoot = NULL;
/*
* Pattern matching compressed refs sequence of address constant NULL
+
* treetop
* istorei
* aload
* l2i (X==0 )
* lushr (compressionSequence )
* a2l
* aconst NULL (X==0 sharedMemory )
* iconst 3
*/
if (cg->comp()->useCompressedPointers() &&
(node->getSymbolReference()->getSymbol()->getDataType() == TR::Address) &&
(valueChild->getDataType() != TR::Address) &&
(valueChild->getOpCodeValue() == TR::l2i) &&
(valueChild->isZero()))
{
TR::Node *tmpNode = valueChild;
while (tmpNode->getNumChildren() && tmpNode->getOpCodeValue() != TR::a2l)
tmpNode = tmpNode->getFirstChild();
if (tmpNode->getNumChildren())
tmpNode = tmpNode->getFirstChild();
if (tmpNode->getDataType().isAddress() && tmpNode->isConstZeroValue() && (tmpNode->getRegister() == NULL))
{
valueChildRoot = valueChild;
}
}
/*
* Use xzr as source register of str instruction
* if valueChild is a compressed refs sequence of address constant NULL,
* or valueChild is a zero constant integer.
*/
if ((valueChildRoot != NULL) || (valueChild->getDataType().isIntegral() && valueChild->isConstZeroValue() && (valueChild->getRegister() == NULL)))
{
TR::Register *zeroReg = cg->allocateRegister();
generateMemSrc1Instruction(cg, op, node, tempMR, zeroReg);
TR::RegisterDependencyConditions *deps = new (cg->trHeapMemory()) TR::RegisterDependencyConditions(0, 1, cg->trMemory());
deps->addPostCondition(zeroReg, TR::RealRegister::xzr);
generateLabelInstruction(cg, TR::InstOpCode::label, node, generateLabelSymbol(cg), deps);
cg->stopUsingRegister(zeroReg);
}
else
{
generateMemSrc1Instruction(cg, op, node, tempMR, cg->evaluate(valueChild));
}
if (needSync)
{
// ordered and lazySet operations will not generate a post-write sync
if (!lazyVolatile)
{
generateSynchronizationInstruction(cg, TR::InstOpCode::dmb, node, TR::InstOpCode::ish);
}
}
if (valueChildRoot != NULL)
{
cg->recursivelyDecReferenceCount(valueChildRoot);
}
else
{
cg->decReferenceCount(valueChild);
}
tempMR->decNodeReferenceCounts(cg);
return NULL;
}

[2]
TR::Register *commonLoadEvaluator(TR::Node *node, TR::InstOpCode::Mnemonic op, int32_t size, TR::Register *targetReg, TR::CodeGenerator *cg)
{
bool needSync = (node->getSymbolReference()->getSymbol()->isSyncVolatile() && cg->comp()->target().isSMP());
node->setRegister(targetReg);
TR::MemoryReference *tempMR = TR::MemoryReference::createWithRootLoadOrStore(cg, node);
tempMR->validateImmediateOffsetAlignment(node, size, cg);
generateTrg1MemInstruction(cg, op, node, targetReg, tempMR);
if (needSync)
{
generateSynchronizationInstruction(cg, TR::InstOpCode::dmb, node, TR::InstOpCode::ishld);
}
tempMR->decNodeReferenceCounts(cg);
return targetReg;
}

[3] https://godbolt.org/z/3PYTc9W3G

Comment on lines 201 to 203
// Since non-volatiles are implemented as two separate loads, we must use a special sequence to perform the load in
// a single instruction even when SMP is disabled.
else if (node->getSymbol()->isSyncVolatile())
else if (node->getSymbol()->isAtLeastOrStrongerThanAcquireRelease())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this comment need to be adjusted?

@@ -315,7 +315,7 @@ OMR::SymbolReference::getUseDefAliasesBV(bool isDirectCall, bool includeGCSafePo
// (this is the same as before), or if we are unresolved and condy
// (this is the extra condition added), we would return conservative aliases.
if ((self()->isUnresolved() && (_symbol->isConstantDynamic() || !_symbol->isConstObjectRef())) ||
_symbol->isVolatile() || self()->isLiteralPoolAddress() ||
!_symbol->isTransparent() || self()->isLiteralPoolAddress() ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering whether testing !_symbol->isTransparent() is more conservative than necessary, though it's definitely safe. I'm just trying to understand whether OpaqueSemantics really needs to be treated conservatively here, or if testing for _symbol->isAtLeastOrStrongerThanAcquireRelease() would be sufficient here and elsewhere in this method.

@vijaysun-omr, @zl-wang, thoughts?

@@ -2662,7 +2662,7 @@ OMR::Node::mayModifyValue(TR::SymbolReference * symRef)
TR::Symbol * symbol = symRef->getSymbol();
if (node->getOpCode().isCall() ||
node->getOpCodeValue() == TR::monexit ||
(node->getOpCode().hasSymbolReference() && node->getSymbol()->isVolatile()) ||
(node->getOpCode().hasSymbolReference() && !node->getSymbol()->isTransparent()) ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, is testing !isTransparent() more conservative than necessary? I'm wondering whether isAtLeastOrStrongerThanAcquireRelease() would be sufficient here.

@@ -176,7 +176,7 @@ OMR::Node::mayUse()
TR_UseDefAliasSetInterface
OMR::Node::mayKill(bool gcSafe)
{
if (self()->getOpCode().hasSymbolReference() && (self()->getOpCode().isLikeDef() || self()->mightHaveVolatileSymbolReference())) //we want the old behavior in these cases
if (self()->getOpCode().hasSymbolReference() && (self()->getOpCode().isLikeDef() || self()->mightHaveNonTransparentSymbolReference())) //we want the old behavior in these cases
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I'm wondering whether OpaqueSemantics really needs to be treated conservatively here, and if this could safely check for "at least or stronger than acquire/release". Does this need to worry only about what other threads might do? If so, I don't think the fact that opaque operations need to be performed atomically would matter.

@@ -306,7 +306,7 @@ static bool isSafeToReplaceNode(TR::Node *currentNode, TR::TreeTop *curTreeTop,
* => xload/xloadi a.volatileField
* ...
*/
//if (mayBeVolatileReference && !canMoveIfVolatile)
//if (mayBeNonTransparentReference && !canMoveIfVolatile)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments that appear just above here talk about restrictions on "swinging down" volatile. Those comments will need to be updated.

Comment on lines +267 to +269
bool mayBeNonTransparentReference = currentNode->mightHaveNonTransparentSymbolReference();
// Do not swing down non-transparent nodes
if (mayBeNonTransparentReference)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this needs to be conservative in the treatment of OpaqueSemantics. I think it really only needs to worry about situations where there could be interactions with other threads, so AcquireReleaseSemantics or stronger.

I suspect the same might hold true for many of the changes for other optimizations, so I think they need to be considered on a case-by-case basis. I won't go through and comment on each.

@Spencer-Comin
Copy link
Contributor Author

@hzongaro re: overly conservative treatment of OpaqueSemantics

Since being overly conservative is still correct (and equivalent to what we already have with volatile/plain), do you think it would be better to get these changes in as-is and then address relaxing the restrictions on individual optimizations in future PRs?

@hzongaro
Copy link
Contributor

re: overly conservative treatment of OpaqueSemantics

Since being overly conservative is still correct (and equivalent to what we already have with volatile/plain), do you think it would be better to get these changes in as-is and then address relaxing the restrictions on individual optimizations in future PRs?

Yes, that sounds reasonable. Getting initial support in for the different memory semantics will allow OMR and downstream projects to begin to take advantage of them, even if the treatment is relatively conservative today.

With the expansion of possible memory ordering semantics from binary volatile
or non-volatile to volatile, acquire/release, opaque, and transparent, all
test whether a symbol is volatile need to be refined depending on the intention
of the test, i.e. is it testing if the symbol is strictly volatile, simply
opaque, or somewhere in between?

Signed-off-by: Spencer Comin <[email protected]>
This change adds arrays for opaque and acquire/release unsafe symrefs to the
symbol reference table. Instead of having four separate fields, the fields are
combined into an array that can be indexed by the OMR::Symbol::AccessMode enum.

Signed-off-by: Spencer Comin <[email protected]>
This flag is removed in OpenJ9.

Signed-off-by: Spencer Comin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants