Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CPU] Enable memory reuse for nested graphs #27521

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

EgorDuplensky
Copy link
Contributor

@EgorDuplensky EgorDuplensky commented Nov 12, 2024

Details:

  • All the nested graphs are now must be a part of a global memory reuse logic
  • The core logic of the memory reuse is untouched (a bit refactored)
  • Instead of solving memory reuse for every graph / subgraph, now all the edges and global execution indices are collected from the "virtually flatten" graph first and then memory reuse is solved once for a model.
  • All the nodes with nested graphs are updated, including:
    1. LoRa
    2. Composite
    3. If
    4. TensorIterator
    5. Convolution + Sum fallback subgraph

Tickets:

  • ticket-id

@EgorDuplensky EgorDuplensky requested review from a team as code owners November 12, 2024 10:53
@github-actions github-actions bot added category: inference OpenVINO Runtime library - Inference category: CPU OpenVINO CPU plugin category: build OpenVINO cmake script / infra labels Nov 12, 2024
@EgorDuplensky EgorDuplensky changed the title Enable memory reuse for nested graps Enable memory reuse for nested graphs Nov 12, 2024
@EgorDuplensky EgorDuplensky changed the title Enable memory reuse for nested graphs [CPU] Enable memory reuse for nested graphs Nov 12, 2024
@EgorDuplensky EgorDuplensky force-pushed the enable_memory_reuse_for_nested_graphs branch from da1d1a0 to 18563de Compare November 12, 2024 15:27
@EgorDuplensky EgorDuplensky force-pushed the enable_memory_reuse_for_nested_graphs branch 2 times, most recently from 4734e61 to e140018 Compare November 13, 2024 13:32
@github-actions github-actions bot removed the category: inference OpenVINO Runtime library - Inference label Nov 13, 2024
@EgorDuplensky
Copy link
Contributor Author

@maxnick Ready for review. Could you please take a look?

@EgorDuplensky EgorDuplensky force-pushed the enable_memory_reuse_for_nested_graphs branch from e140018 to d5bda07 Compare November 13, 2024 13:51
@EgorDuplensky
Copy link
Contributor Author

Added a fix for Convolution + Sum fallback graph.
Now such graph is also a part of memory reuse.

src/plugins/intel_cpu/src/compiled_model.cpp Outdated Show resolved Hide resolved
src/plugins/intel_cpu/src/edge.h Outdated Show resolved Hide resolved
src/plugins/intel_cpu/src/edge.h Outdated Show resolved Hide resolved
src/plugins/intel_cpu/src/graph.h Outdated Show resolved Hide resolved
src/plugins/intel_cpu/src/graph.h Outdated Show resolved Hide resolved
Comment on lines +300 to +332
virtual bool canBeSkipped() const {
return getSelectedPrimitiveDescriptor()->hasZeroInputDims();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name is rather vague. Where can it be skipped from? Apparently we need to change name to make it more clear.
What about isNop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this naming discussion to the end of the queue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is not the right time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maxnick Lets revive the discussion, since all the other discussions are finished.
After taking a fresh look into this, I think the main idea is that here we decide that the node should not be added to the list of the executable ones. We can express it in a verbose way, like:

  • neverExecute()
  • shouldNeverBeExecuted
  • includeIntoExecutables

I am ok with isNop as well, but it does not really express that the node must never be executed. NoOp in general can be executed, it just does nothing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what you've proposed I like neverExecute().

src/plugins/intel_cpu/src/nodes/composite.cpp Outdated Show resolved Hide resolved
src/plugins/intel_cpu/src/nodes/conv.cpp Show resolved Hide resolved
src/plugins/intel_cpu/src/nodes/lora.cpp Outdated Show resolved Hide resolved
@EgorDuplensky EgorDuplensky force-pushed the enable_memory_reuse_for_nested_graphs branch 2 times, most recently from 408e748 to 11a5115 Compare November 25, 2024 15:28
src/plugins/intel_cpu/src/graph.cpp Outdated Show resolved Hide resolved
src/plugins/intel_cpu/src/graph.cpp Outdated Show resolved Hide resolved
src/plugins/intel_cpu/src/graph.cpp Outdated Show resolved Hide resolved
@EgorDuplensky EgorDuplensky force-pushed the enable_memory_reuse_for_nested_graphs branch 2 times, most recently from 7f3fadf to 9e170ae Compare December 10, 2024 17:56
Copy link
Contributor

This PR will be closed in a week because of 2 weeks of no activity.

@github-actions github-actions bot added the Stale label Dec 26, 2024
@mg-intel mg-intel removed the Stale label Jan 2, 2025
Copy link
Contributor

This PR will be closed in a week because of 2 weeks of no activity.

@github-actions github-actions bot added the Stale label Jan 17, 2025
@mg-intel mg-intel removed the Stale label Jan 17, 2025
@EgorDuplensky EgorDuplensky force-pushed the enable_memory_reuse_for_nested_graphs branch 9 times, most recently from 9e2648a to efb84d1 Compare January 21, 2025 19:24
@EgorDuplensky EgorDuplensky force-pushed the enable_memory_reuse_for_nested_graphs branch from af3a926 to 705eabf Compare January 25, 2025 22:43
@maxnick maxnick self-requested a review January 27, 2025 12:30
src/plugins/intel_cpu/src/graph.h Outdated Show resolved Hide resolved
src/plugins/intel_cpu/src/graph.h Outdated Show resolved Hide resolved
src/plugins/intel_cpu/src/graph.h Outdated Show resolved Hide resolved
Comment on lines +1151 to +1170
if (memoryControl->allocated()) {
return; // memory is already allocated globally
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my understanding, Activate() for the subgraph must be called in creaetPrimitive or prepareParams methods. Also the Allocate() method is called only inside Activate before CreatePrimitivesAndExecConstants(). Thus the code below is executed only for the top most graph as it calls memoryControl->allocateMemory() and this if branch is always taken for subgraphs.
Moreover, do I correctly understand that if a node with a subgraph by some reason calls Activate for a subgraph before it's called for the main graph (any stage of Configure). The whole memory management subsystem will be broken, as the rest of this method code will not be invoked.
If so, it's just another indicator that we need to consider as special subgraph type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem here is the way the legacy graph pipeline stages are implemented.
With a proper pipeline such situation will not be possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any plans of implementing the proper pipeline? How it should look like?

src/plugins/intel_cpu/src/graph.cpp Outdated Show resolved Hide resolved
src/plugins/intel_cpu/src/graph_context.h Show resolved Hide resolved
Comment on lines +300 to +332
virtual bool canBeSkipped() const {
return getSelectedPrimitiveDescriptor()->hasZeroInputDims();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now?

src/plugins/intel_cpu/src/compiled_model.h Outdated Show resolved Hide resolved
src/plugins/intel_cpu/src/compiled_model.h Outdated Show resolved Hide resolved
src/plugins/intel_cpu/src/nodes/memory.cpp Outdated Show resolved Hide resolved
@EgorDuplensky EgorDuplensky force-pushed the enable_memory_reuse_for_nested_graphs branch 4 times, most recently from ed7ca84 to 467ef3b Compare January 28, 2025 17:32
Comment on lines +28 to +29
m_auxiliaryNetworkMemoryControl(std::make_shared<NetworkMemoryControl>()),
m_memoryControl(m_auxiliaryNetworkMemoryControl->createMemoryControlUnit()) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmitry-gorokhov I decided to not create a wrapper around m_auxiliaryNetworkMemoryControl and m_memoryControl.
Instead, it is redesigned, so m_memoryControl is just one memory control instance of m_auxiliaryNetworkMemoryControl

Comment on lines +300 to +332
virtual bool canBeSkipped() const {
return getSelectedPrimitiveDescriptor()->hasZeroInputDims();
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maxnick Lets revive the discussion, since all the other discussions are finished.
After taking a fresh look into this, I think the main idea is that here we decide that the node should not be added to the list of the executable ones. We can express it in a verbose way, like:

  • neverExecute()
  • shouldNeverBeExecuted
  • includeIntoExecutables

I am ok with isNop as well, but it does not really express that the node must never be executed. NoOp in general can be executed, it just does nothing.

src/plugins/intel_cpu/src/nodes/lora.cpp Outdated Show resolved Hide resolved
src/plugins/intel_cpu/src/edge.h Outdated Show resolved Hide resolved
src/plugins/intel_cpu/src/infer_request.cpp Outdated Show resolved Hide resolved
Comment on lines +1151 to +1170
if (memoryControl->allocated()) {
return; // memory is already allocated globally
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem here is the way the legacy graph pipeline stages are implemented.
With a proper pipeline such situation will not be possible.

src/plugins/intel_cpu/src/graph.cpp Outdated Show resolved Hide resolved
src/plugins/intel_cpu/src/compiled_model.h Outdated Show resolved Hide resolved
src/plugins/intel_cpu/src/compiled_model.h Outdated Show resolved Hide resolved
src/plugins/intel_cpu/src/nodes/memory.cpp Outdated Show resolved Hide resolved
@EgorDuplensky EgorDuplensky force-pushed the enable_memory_reuse_for_nested_graphs branch 2 times, most recently from 31120e1 to 9ac1226 Compare January 29, 2025 18:47
Sync node indexes must be registered to global allocation context
in order.
@EgorDuplensky EgorDuplensky force-pushed the enable_memory_reuse_for_nested_graphs branch from 9ac1226 to 2497eb3 Compare January 30, 2025 12:19
Comment on lines +22 to +23
class NetworkMemoryControl;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this forward declaration is also not needed anymore.

Comment on lines +1151 to +1170
if (memoryControl->allocated()) {
return; // memory is already allocated globally
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any plans of implementing the proper pipeline? How it should look like?

#include "edge.h"
#include "graph_context.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's already included in graph.h.

Comment on lines +744 to +745
* Partition the \clusters of Edges, by moving to the end and allocating at the same time
* the clusters which cannot be handled as part of generic memory solver algorithm.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Partition the \clusters of Edges, by moving to the end and allocating at the same time
* the clusters which cannot be handled as part of generic memory solver algorithm.
* Partition the \clusters of Edges, by moving to the end and allocating at the same time
* the clusters that cannot be handled as part of the generic memory solver algorithm.

[](const EdgePtr& edge) {
return edge->getOriginalDesc().getPrecision() == element::string;
}),
"All edges in the cluster must be string.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"All edges in the cluster must be string.");
"All edges in a string cluster must be strings.");

Comment on lines +878 to +879
status = hasDynNodes ? (parallel_get_max_threads() > 1 ? Status::ReadyDynamic : Status::ReadyDynamicSeq)
: Status::ReadyStatic;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This operation is useless as the following if/else block completely redefines the status once again.

src/plugins/intel_cpu/src/graph_context.h Show resolved Hide resolved
@@ -7,6 +7,7 @@
#include "async_infer_request.h"
#include "dnnl_extension_utils.h"
#include "itt.h"
#include "memory_control.hpp"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably this include is not needed anymore.

Comment on lines +116 to +117
const int elseOffset = m_elseGraph.RegisterToAllocationContext(thenOffset, context);
return m_elseGraph.RegisterToAllocationContext(elseOffset, context);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason of registering the m_elseGraph in the allocation context two times?

const std::shared_ptr<const ov::Model>& elseBody = ifOp->get_else_body();
subGraphThen.CreateGraph(thenBody, context);
subGraphElse.CreateGraph(elseBody, context);
auto ifOp = ov::as_type_ptr<ov::op::v8::If>(m_op);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a null check here?

@@ -499,10 +499,12 @@ void Input::selectOptimalPrimitiveDescriptor() {
// ignore previous configuration
supportedPrimitiveDescriptors.clear();

int inPlacePort = m_isInPlace ? 0 : -1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int inPlacePort = m_isInPlace ? 0 : -1;
const int inPlacePort = m_isInPlace ? 0 : -1;

subgraphMemoryPtrs.push_back(mem);
inputMemory.emplace_back(std::move(mem));
CPU_NODE_ASSERT(getParentEdges().size() == subGraph->inputsNumber(),
"Number of node inputs must be equal the number of inner graph's inputs");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Number of node inputs must be equal the number of inner graph's inputs");
"The number of node inputs must be equal to the number of inner graph's inputs");

Comment on lines +707 to +709
auto subgraphInputNode = subGraph->getInputNodeByIndex(i);
auto subgraphInputMemory = subgraphInputNode->getDstMemoryAtPort(0);
subgraphMemoryPtrs.push_back(subgraphInputMemory);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please elaborate on how we have valid memory objects at this point, while the memory objects are created during the Activate step?

CPU_NODE_ASSERT(getParentEdges().size() == subGraph->inputsNumber(),
"Number of node inputs must be equal the number of inner graph's inputs");

for (size_t i = 0; i < subGraph->inputsNumber(); i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that when the subgraphMemoryPtrs are being used, the traversal goes using getOriginalInputsNumber(). Probably it does make sense to align these two loops.

Comment on lines +124 to +126
auto subgraphInputNode = m_graph.getInputNodeByIndex(i);
auto subgraphInputMemory = subgraphInputNode->getDstMemoryAtPort(0);
subgraphMemoryPtrs.emplace_back(subgraphInputMemory);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the memory objects valid before we call Activate()?

@@ -173,32 +183,33 @@ class MergeTransposeReorderCPUTest : public testing::WithParamInterface<MergeTra

std::shared_ptr<GraphContext> m_context;
std::unique_ptr<Graph> m_graph;
std::shared_ptr<NetworkMemoryControl> networkMemoryControl = std::make_shared<NetworkMemoryControl>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the variable is not used. Could you please revise all the other similar places?


Config conf;
conf.rtCacheCapacity = 0;
auto context = std::make_shared<GraphContext>(conf, nullptr, false);
std::shared_ptr<NetworkMemoryControl> networkMemoryControl = std::make_shared<NetworkMemoryControl>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this variable is not used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: build OpenVINO cmake script / infra category: CPU OpenVINO CPU plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants