Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profile mem #678

Merged
merged 8 commits into from
Nov 23, 2022
Merged

Profile mem #678

merged 8 commits into from
Nov 23, 2022

Conversation

kba
Copy link
Member

@kba kba commented Mar 4, 2021

When running processors with ocrd process, we already measure the wall and CPU time of a processor run. This adds basic tracking of memory usage with https://github.com/pythonprofilers/memory_profiler.

Currently, this just outputs a sparkline of memory usage and max/min memory, e.g.

16:35:54.593 INFO ocrd.process.profile - memory consumption: ▁▁▂▅▅▃▄▄▄▄▄▄▄▄▄▄▅▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▅▅▅▅▅▅▅▅▅▅▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆██▅▅▅▅▅▅▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▅▅▅▅▆▆▆▆▆▆▆▆▆▆▆▆▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▆▄▄
max: 300.79 MiB min: 133.09 MiB

Copy link
Collaborator

@bertsky bertsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent!

Not sure how to add that, but memory_profiler is based on psutil, which would also give us .cpu_times()['iowait'] or .disk_io_counters()['read_time'] / write_time / busy_time etc...

@@ -69,7 +71,13 @@ def run_processor(
log.debug("Processor instance %s (%s doing %s)", processor, name, otherrole)
t0_wall = perf_counter()
t0_cpu = process_time()
processor.process()
mem_usage = memory_usage(proc=processor.process, interval=.1, timeout=None, timestamps=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you wanna add multiprocess=True in there for processors that use multiprocessing or similar, or that delegate to shell calls. Perhaps also include_children=True (not sure about the meaning though – does the sum then count shared/CoW pages repeatedly?).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you wanna add multiprocess=True in there for processors that use multiprocessing or similar, or that delegate to shell calls.

I will.

Perhaps also include_children=True (not sure about the meaning though – does the sum then count shared/CoW pages repeatedly?).

            if include_children:
                mem +=  sum(_get_child_memory(process, meminfo_attr))

https://github.com/pythonprofilers/memory_profiler/blob/master/memory_profiler.py#L135-L136

Copy link
Collaborator

@bertsky bertsky Mar 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps also include_children=True (not sure about the meaning though – does the sum then count shared/CoW pages repeatedly?).

            if include_children:
                mem +=  sum(_get_child_memory(process, meminfo_attr))

I saw that, but the question remains: what is in that sum? (We probably don't want to count shared and CoW pages repeatedly. But it may be hard to calculate exactly. IIRC standard ps and top and even time do count repetitions, so they appear too large – don't "add up" – for multiprocessed programs, whereas smem gets it right. This tells me we are actually interested in the PSS, the proportional set size, rather than the RSS.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, but PSS vs RSS is a more general problem/discussion. Any multiprocessing coder will know about that. And RSS is still a valuable information. So let's stick to your current implementation.

@bertsky
Copy link
Collaborator

bertsky commented Mar 4, 2021

This will conflict with #652 (where that part of run_processor is delegated to the new run_api) – not sure what's the least painful way for merging.

Oh, since I saw this also uses multiprocessing, have you measured any CPU overhead for the memory profiling yet?

@kba
Copy link
Member Author

kba commented Mar 4, 2021

This will conflict with #652 (where that part of run_processor is delegated to the new run_api) – not sure what's the least painful way for merging.

Is that just an issue of resolving conflicts or do you mean there's an incompatible API-wise now?

Oh, since I saw this also uses multiprocessing, have you measured any CPU overhead for the memory profiling yet?

I haven't, yet. I did notice that when I profile the current process (proc=-1) instead of the processor.process() call only, then either it just offers a single frame of measurement (max_iter=1) or will hang until the timeout is reached (timeout=100)

@bertsky
Copy link
Collaborator

bertsky commented Mar 4, 2021

This will conflict with #652 (where that part of run_processor is delegated to the new run_api) – not sure what's the least painful way for merging.

Is that just an issue of resolving conflicts or do you mean there's an incompatible API-wise now?

It looks like this is only cosmetic (since the workflow server does not handle the retval either, it just adds exception handling, and does not use multiprocessing itself, at least in the current implementation). But I would have to check/test.

@@ -69,7 +71,12 @@ def run_processor(
log.debug("Processor instance %s (%s doing %s)", processor, name, otherrole)
t0_wall = perf_counter()
t0_cpu = process_time()
processor.process()
mem_usage = memory_usage(proc=processor.process, interval=.1, timeout=None, timestamps=True, multiprocess=True, include_children=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mem_usage = memory_usage(proc=processor.process, interval=.1, timeout=None, timestamps=True, multiprocess=True, include_children=True)
mem_usage = memory_usage(proc=processor.process,
interval=.1, timeout=None, timestamps=True,
# include sub-processes
multiprocess=True, include_children=True,
# get proportional set size instead of RSS
backend='psutil_pss')

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is really that simple – let's make this configurable (PSS vs RSS measuring). Did you try this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, however we first need a mechanism to toggle this behavior. Simple idea would be to add CLI flags --mem-profile-rss and --mem-profile-pss.

@bertsky prefers a way to have this configurable via the logging configuration - @M3ssman do you have a good idea how that might work? That way, we could keep the CLI arguments small and the --help output less confusing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps for the moment it could be as simple as

mem_prof = os.getenv('OCRD_PROFILE', '')
if mem_prof:
    ...
        backend='psutil_pss' if mem_prof == 'PSS' else 'psutil')

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like environment variables in general, but when the choice is between environment variables and CLI flags, I prefer the latter, because at least it's consistent. Because otherwise you could get OCRD_PROFILE=PSS ocrd-dummy --profile-file my.prof ....

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kba I agree, however I did not argue env var plus cli arg, but only the former. Perhaps we could also move --profile and --profile-file into env var control entirely.

Or we go the other direction and put everything into a single CLI arg, just with multi-faceted value: --profile mem=rss,cpu,file=foo.log for example.

For the broader discussion of how to configure processors without adding to many options / parameters and what for, see this old thread

Copy link
Member Author

@kba kba Nov 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, now that the wonky multiple-process-call behavior is fixed, I'll implement this via environment variable OCRD_PROFILE and OCRD_PROFILE_FILE

  • OCRD_PROFILE not set or empty: no profiling
  • OCRD_PROFILE_FILE set to a value (== current --profile-file flag):
    • if OCRD_PROFILE is defined but empty: no profiling
    • if OCRD_PROFILE is not defined: set OCRD_PROFILE to CPU
      * if OCRD_PROFILE is defined but does not contain CPU: raise error too strict, PROFILE_FILE should imply PROFILE^=CPU
  • OCRD_PROFILE contains CPU, do CPU profiling (== current --profile flag)
  • OCRD_PROFILE contains RSS do RSS memory profiling
  • OCRD_PROFILE contains PSS: do proportionate mem profiling
  • If OCRD_PROFILE contains both PSS and RSS: raise error too strict, PSS takes precedence

Did I catch everything?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented in 4fdfbf3

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Environment variables look good to me.

In the readme, I would also suggest to reference an external link where the difference between RSS and PSS is explained or at least briefly described. May not be the best suggestion, but consider this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, sounds reasonable and complete AFAICS – thx!

@bertsky
Copy link
Collaborator

bertsky commented Sep 1, 2021

I wonder if it would be much effort to incorporate gputil for (avg/peak) GPU memory usage statistics. (Or perhaps one could add it to memory_profiler as an alternative backend?)

@bertsky
Copy link
Collaborator

bertsky commented Oct 12, 2022

I wonder if it would be much effort to incorporate gputil for (avg/peak) GPU memory usage statistics. (Or perhaps one could add it to memory_profiler as an alternative backend?)

I'm afraid it's not that simple at all. GPutil only gives snapshots, no aggregation/statistics there. And it looks like one would need to use the respective ML framework's facilities for process/session measurements anyway. (See here and there for Pytorch. TF now also has things like get_memory_info / get_memory_usage / reset_memory_stats.)

Let's treat that off-topic for now!

@bertsky
Copy link
Collaborator

bertsky commented Oct 12, 2022

@kba we could really use that prior to #875. Would you consider fast-tracking this?

@kba kba self-assigned this Oct 21, 2022
@bertsky bertsky mentioned this pull request Nov 22, 2022
@kba
Copy link
Member Author

kba commented Nov 22, 2022

@kba we could really use that prior to #875. Would you consider fast-tracking this?

Sure. It will be temporarily inconsistent with CLI flags vs environment variables to configure behaviors and we should definitely continue the discussion on configuration options.

ocrd/ocrd/processor/helpers.py Outdated Show resolved Hide resolved
@kba kba requested a review from bertsky November 23, 2022 11:47
@kba kba requested a review from MehmedGIT November 23, 2022 11:55
* `RSS`: Enable RSS memory profiling
* `PSS`: Enable proportionate memory profiling
* `OCRD_PROFILE_FILE`: If set, then the CPU profile is written to this file for later peruse with a analysis tools like [snakeviz](https://jiffyclub.github.io/snakeviz/)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Logging is configured via configuration files, see [module logging](./ocrd_utils#ocr-d-module-logging).

@kba kba merged commit 1b5a362 into master Nov 23, 2022
@kba kba deleted the profile-mem branch November 17, 2023 13:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants