-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CPU] Enable memory reuse for nested graphs #27521
base: master
Are you sure you want to change the base?
[CPU] Enable memory reuse for nested graphs #27521
Conversation
da1d1a0
to
18563de
Compare
4734e61
to
e140018
Compare
@maxnick Ready for review. Could you please take a look? |
e140018
to
d5bda07
Compare
Added a fix for Convolution + Sum fallback graph. |
virtual bool canBeSkipped() const { | ||
return getSelectedPrimitiveDescriptor()->hasZeroInputDims(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name is rather vague. Where can it be skipped from? Apparently we need to change name to make it more clear.
What about isNop
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move this naming discussion to the end of the queue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is not the right time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maxnick Lets revive the discussion, since all the other discussions are finished.
After taking a fresh look into this, I think the main idea is that here we decide that the node should not be added to the list of the executable ones. We can express it in a verbose way, like:
neverExecute()
shouldNeverBeExecuted
includeIntoExecutables
I am ok with isNop
as well, but it does not really express that the node must never be executed. NoOp in general can be executed, it just does nothing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what you've proposed I like neverExecute()
.
src/plugins/intel_cpu/tests/functional/cmake/target_per_test.cmake
Outdated
Show resolved
Hide resolved
408e748
to
11a5115
Compare
7f3fadf
to
9e170ae
Compare
This PR will be closed in a week because of 2 weeks of no activity. |
This PR will be closed in a week because of 2 weeks of no activity. |
9e2648a
to
efb84d1
Compare
af3a926
to
705eabf
Compare
if (memoryControl->allocated()) { | ||
return; // memory is already allocated globally |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To my understanding, Activate()
for the subgraph must be called in creaetPrimitive
or prepareParams
methods. Also the Allocate()
method is called only inside Activate
before CreatePrimitivesAndExecConstants()
. Thus the code below is executed only for the top most graph as it calls memoryControl->allocateMemory()
and this if branch is always taken for subgraphs.
Moreover, do I correctly understand that if a node with a subgraph by some reason calls Activate
for a subgraph before it's called for the main graph (any stage of Configure). The whole memory management subsystem will be broken, as the rest of this method code will not be invoked.
If so, it's just another indicator that we need to consider as special subgraph type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem here is the way the legacy graph pipeline stages are implemented.
With a proper pipeline such situation will not be possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any plans of implementing the proper pipeline? How it should look like?
virtual bool canBeSkipped() const { | ||
return getSelectedPrimitiveDescriptor()->hasZeroInputDims(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now?
ed7ca84
to
467ef3b
Compare
m_auxiliaryNetworkMemoryControl(std::make_shared<NetworkMemoryControl>()), | ||
m_memoryControl(m_auxiliaryNetworkMemoryControl->createMemoryControlUnit()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dmitry-gorokhov I decided to not create a wrapper around m_auxiliaryNetworkMemoryControl and m_memoryControl.
Instead, it is redesigned, so m_memoryControl is just one memory control instance of m_auxiliaryNetworkMemoryControl
virtual bool canBeSkipped() const { | ||
return getSelectedPrimitiveDescriptor()->hasZeroInputDims(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maxnick Lets revive the discussion, since all the other discussions are finished.
After taking a fresh look into this, I think the main idea is that here we decide that the node should not be added to the list of the executable ones. We can express it in a verbose way, like:
neverExecute()
shouldNeverBeExecuted
includeIntoExecutables
I am ok with isNop
as well, but it does not really express that the node must never be executed. NoOp in general can be executed, it just does nothing.
if (memoryControl->allocated()) { | ||
return; // memory is already allocated globally |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem here is the way the legacy graph pipeline stages are implemented.
With a proper pipeline such situation will not be possible.
31120e1
to
9ac1226
Compare
Sync node indexes must be registered to global allocation context in order.
9ac1226
to
2497eb3
Compare
class NetworkMemoryControl; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this forward declaration is also not needed anymore.
if (memoryControl->allocated()) { | ||
return; // memory is already allocated globally |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any plans of implementing the proper pipeline? How it should look like?
#include "edge.h" | ||
#include "graph_context.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's already included in graph.h
.
* Partition the \clusters of Edges, by moving to the end and allocating at the same time | ||
* the clusters which cannot be handled as part of generic memory solver algorithm. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Partition the \clusters of Edges, by moving to the end and allocating at the same time | |
* the clusters which cannot be handled as part of generic memory solver algorithm. | |
* Partition the \clusters of Edges, by moving to the end and allocating at the same time | |
* the clusters that cannot be handled as part of the generic memory solver algorithm. |
[](const EdgePtr& edge) { | ||
return edge->getOriginalDesc().getPrecision() == element::string; | ||
}), | ||
"All edges in the cluster must be string."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"All edges in the cluster must be string."); | |
"All edges in a string cluster must be strings."); |
status = hasDynNodes ? (parallel_get_max_threads() > 1 ? Status::ReadyDynamic : Status::ReadyDynamicSeq) | ||
: Status::ReadyStatic; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This operation is useless as the following if/else block completely redefines the status
once again.
@@ -7,6 +7,7 @@ | |||
#include "async_infer_request.h" | |||
#include "dnnl_extension_utils.h" | |||
#include "itt.h" | |||
#include "memory_control.hpp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably this include is not needed anymore.
const int elseOffset = m_elseGraph.RegisterToAllocationContext(thenOffset, context); | ||
return m_elseGraph.RegisterToAllocationContext(elseOffset, context); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason of registering the m_elseGraph
in the allocation context two times?
const std::shared_ptr<const ov::Model>& elseBody = ifOp->get_else_body(); | ||
subGraphThen.CreateGraph(thenBody, context); | ||
subGraphElse.CreateGraph(elseBody, context); | ||
auto ifOp = ov::as_type_ptr<ov::op::v8::If>(m_op); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a null check here?
@@ -499,10 +499,12 @@ void Input::selectOptimalPrimitiveDescriptor() { | |||
// ignore previous configuration | |||
supportedPrimitiveDescriptors.clear(); | |||
|
|||
int inPlacePort = m_isInPlace ? 0 : -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int inPlacePort = m_isInPlace ? 0 : -1; | |
const int inPlacePort = m_isInPlace ? 0 : -1; |
subgraphMemoryPtrs.push_back(mem); | ||
inputMemory.emplace_back(std::move(mem)); | ||
CPU_NODE_ASSERT(getParentEdges().size() == subGraph->inputsNumber(), | ||
"Number of node inputs must be equal the number of inner graph's inputs"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Number of node inputs must be equal the number of inner graph's inputs"); | |
"The number of node inputs must be equal to the number of inner graph's inputs"); |
auto subgraphInputNode = subGraph->getInputNodeByIndex(i); | ||
auto subgraphInputMemory = subgraphInputNode->getDstMemoryAtPort(0); | ||
subgraphMemoryPtrs.push_back(subgraphInputMemory); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please elaborate on how we have valid memory objects at this point, while the memory objects are created during the Activate
step?
CPU_NODE_ASSERT(getParentEdges().size() == subGraph->inputsNumber(), | ||
"Number of node inputs must be equal the number of inner graph's inputs"); | ||
|
||
for (size_t i = 0; i < subGraph->inputsNumber(); i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please note that when the subgraphMemoryPtrs
are being used, the traversal goes using getOriginalInputsNumber()
. Probably it does make sense to align these two loops.
auto subgraphInputNode = m_graph.getInputNodeByIndex(i); | ||
auto subgraphInputMemory = subgraphInputNode->getDstMemoryAtPort(0); | ||
subgraphMemoryPtrs.emplace_back(subgraphInputMemory); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the memory objects valid before we call Activate()
?
@@ -173,32 +183,33 @@ class MergeTransposeReorderCPUTest : public testing::WithParamInterface<MergeTra | |||
|
|||
std::shared_ptr<GraphContext> m_context; | |||
std::unique_ptr<Graph> m_graph; | |||
std::shared_ptr<NetworkMemoryControl> networkMemoryControl = std::make_shared<NetworkMemoryControl>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that the variable is not used. Could you please revise all the other similar places?
|
||
Config conf; | ||
conf.rtCacheCapacity = 0; | ||
auto context = std::make_shared<GraphContext>(conf, nullptr, false); | ||
std::shared_ptr<NetworkMemoryControl> networkMemoryControl = std::make_shared<NetworkMemoryControl>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this variable is not used.
Details:
Tickets: