-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CPU] Add dumping of the memory statistcs #28441
base: master
Are you sure you want to change the base?
Conversation
@@ -66,6 +66,12 @@ void DebugCapsConfig::readProperties() { | |||
|
|||
if ((envVarValue = readEnv("OV_CPU_AVERAGE_COUNTERS"))) | |||
averageCountersPath = envVarValue; | |||
|
|||
if ((envVarValue = readEnv("OV_CPU_MEMORY_STATISTICS_LEVEL"))) | |||
memoryStatisticsDumpLevel = std::stoi(envVarValue); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we align the handling of those environment variables with the ones we already have?
I mean OV_CPU_MEMORY_STATISTICS_PATH could have options like :
- cout
- *.csv
- *.etc
And automatically enable the level 1 if specified.
And OV_CPU_MEMORY_STATISTICS_LEVEL could be used to increase the level
This way one could dump the statistic to csv file without a need to specify two environment variables
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@EgorDuplensky , do you have any further comments? |
|
||
private: | ||
std::shared_ptr<MemoryBlockWithRelease> m_pBlock; | ||
size_t m_max_requested_size = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is having a size the only reason we are introducing debug caps wrapper here?
If yes, is it possible to implement a size() method for all the non-debug-caps implementations instead? This can be helpful for other troubleshooting use cases, i.e. debug logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is the reason. But not the only one. In the release build the memory blocks are shared across many tensors, therefore it's impossible to track size request from each tensor, we only maintain the minimal size that can accommodate the biggest tensor. Thus it's impossible to retrieve the information about each tensor size at this level.
The fact that we don't have size()
method in the dynamic memory block interface is a deliberate design choice, which allows to avoid additional semantic constraints. The thing is that the memory block may or may not allocate memory, for example a partitioned memory block doesn't allocate anything. But when we work through the interface, we don't even know which specific type of the memory block has the object, therefore it's difficult to correctly interpret the result. In the partitioned memory block example, returning the partition size may give the impression that a memory block of this size is allocated, while it isn't, it's just a view. Also receiving a max size of the "memory block with reuse" may be unexpected when we call this method to an abstract object. Therefore:
- Introducing this method to the memory block interface won't allow us to avoid the wrappers, as being shared, such memory blocks don't store information about each resize request.
- Without the knowledge of the dynamic type of the object, it's not that clear how to interpret the result of an abstract
size()
request. And for types that may unambiguously define the meaning of thesize()
request result such method is introduced (e.g. MemoryBlockWithReuse) - Beyond this low level memory management subsystem, the memory size may be requested from the memory descriptor of the corresponding memory object (for logging and troubleshooting purposes).
} | ||
|
||
private: | ||
std::vector<std::shared_ptr<MemoryBlockWithRelease>> m_unique_blocks; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be possible to count unique_blocks even without this data structure, isn't?
I mean it will be slower for sure, but do we care how fast we collect those statistics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also used to calculate the actually allocated memory size. The wrappers stores the memory size requested from each tensor, but not the really allocated memory.
@@ -371,10 +618,20 @@ void MemoryControl::releaseMemory() { | |||
m_allocated = false; | |||
} | |||
|
|||
edgeClusters MemoryControl::findEdgeClusters(const std::vector<EdgePtr>& graphEdges) { | |||
#ifdef CPU_DEBUG_CAPS | |||
MemoryStatistics MemoryControl::dumpStatistics() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about using free friend functions instead, which will be able to access the private fields, so we can move every debug caps related logic into a separate file to avoid cluttering the production logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it can make the code cleaner but my idea was that this statics calculation is strictly bound to the specific memory manager type (in terms of data members and the underlining algorithm) so once the main memory management implementation is changed, this memory statistics collection subroutine most likely needs to be changed too.
Moreover, I even didn't want to wrap them into the CPU_DEBUG_CAPS
macro, but since we want to keep the main version as lightweight and fast as possible, some of this implementations become ill formed as they access only debug versions of data members.
Thus, if you still sure that it's better to move them into a separate file even though it will be more difficult to keep them relevant, I'll do it.
What do you think?
Details:
Add yet another debug capability: dumping the following memory statistics:
Standard output and
*.csv
file dump are supportedToDo:
Tickets: