Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large lingering mmaps that apparently could be paged out seem to instead cause large memory crashes #318

Open
ell1e opened this issue Jun 6, 2024 · 14 comments
Labels

Comments

@ell1e
Copy link

ell1e commented Jun 6, 2024

I'm using a chat client called nheko which uses a lot of memory during the initial log in. However, usually later it uses less memory during normal use. Yet, after one of those log ins, many minutes later I got this out of memory "crash":

Jun  6 14:32:30 pinephone daemon.info earlyoom: mem avail:   203 of  2994 MiB ( 6.78%), swap free:  255 of 1023 MiB (24.95%)
Jun  6 14:32:30 pinephone daemon.info earlyoom: low memory! at or below SIGTERM limits: mem 10.00%, swap 25.00%
Jun  6 14:32:30 pinephone daemon.info earlyoom: sending SIGTERM to process 7440 uid 10000 "io.github.Nheko": badness 1036, VmRSS 1839 MiB
Jun  6 14:32:30 pinephone daemon.info earlyoom: process exited after 0.0 seconds

I asked the developer of Nheko what happened, and whether this is a Nheko memory leak. Especially since after restarting it used less memory again, only around 200mb which is about 10% of what it used when killed.

The nheko developer suggested the issue was a combination of 1. nheko memory maps a large amount of areas but then mostly doesn't use them anymore after that login phase, 2. those memory pages could then be paged out but the kernel doesn't seem to do that eagerly unless memory is tighter than the earlyoom trigger point, 3. earlyoom instead of just doing something to get those pages paged out will seemingly needlessly terminate nheko instead.

(I hope I summed all of that up correctly, my apologies if not.)

Is this fixable in earlyoom? This seems like a quite fundamental issue that will cause unnecessary crashes for apps that use memory mapping extensively.

Affected earlyoom version: earlyoom v1.7, as packaged by postmarketOS 23.12

@hakavlad
Copy link
Contributor

hakavlad commented Jun 6, 2024

mem avail: 203 of 2994 MiB ( 6.78%), swap free: 255 of 1023 MiB (24.95%)

earlyoom just did what it had to do. You can make its settings different.

@hakavlad
Copy link
Contributor

hakavlad commented Jun 6, 2024

swap free: 255 of 1023 MiB (24.95%)

Also I'd like to recommend to increase swap space, maybe up to 3-6 GB, also maybe configure swap on ZRAM.

@ell1e
Copy link
Author

ell1e commented Jun 6, 2024

Okay, so how should I configure it to avoid this? I can't make it trigger later because then the device will lock up. My apologies if I'm missing something.

From what I got from the previous conversations, this doesn't seem to be related to the trigger point chosen but seems to be kind of a problem with earlyoom apparently 1. being triggered by tons of mapped file pages that arent actually actively used (instead of legit used pages of allocated memory), 2. then earlyoom not defaulting to somehow trying to unmap file pages first instead of stopping a program (if that's even something earlyoom could do). So I'm not quite sure what I should change in my configuration to avoid running into this, it sounded to me like something that can't be handled in the config. Then again I really wouldn't know, I don't know that much about mapping files to memory.

@hakavlad
Copy link
Contributor

hakavlad commented Jun 6, 2024

What happens when earlyoom disabled?

@hakavlad
Copy link
Contributor

hakavlad commented Jun 6, 2024

mapped file pages that arent actually actively used

arent actually actively used != free

@hakavlad
Copy link
Contributor

hakavlad commented Jun 6, 2024

those memory pages could then be paged out but the kernel doesn't seem to do that

Seems like the kernel DO that:

swap free is 24.95%

@ell1e
Copy link
Author

ell1e commented Jun 6, 2024

arent actually actively used != free

The Nheko dev said apparently the kernel is meant to swap them out on its own so earlyoom just shouldn't have triggered, if I understood the words correctly. I couldn't tell you who is right 😂 the problem seems to be triggered by LMDB mapping a ton of things when writing all over the database, and then not really unmapping it when it's actually no longer needed, and apparently just ignoring that would lead to correct behavior...? I wouldn't understand myself. What instead happens is that earlyoom triggers and things start going down.

@rfjakob
Copy link
Owner

rfjakob commented Jun 6, 2024 via email

@hakavlad
Copy link
Contributor

hakavlad commented Jun 6, 2024

If earlyoom will not work, then in case of further leakage:

SwapFree and MemAvailable will be closed to 0, then the kernel OOM killer will be triggered.

Try simply increasing the swap space and decreasing earlyoom thresholds (for example to 4-6%)

@ell1e
Copy link
Author

ell1e commented Jun 6, 2024

I've been trying to recreate the exact situation to use pmap but haven't managed, sadly. If I ever do I'll reopen. My apologies for the ticket spam, and I appreciate all your helpful responses!

@ell1e ell1e closed this as not planned Won't fix, can't repro, duplicate, stale Jun 6, 2024
@rfjakob
Copy link
Owner

rfjakob commented Jun 6, 2024

Looks like VmRSS is in fact very strange with mmap mappings present. Look at this: htop-dev/htop#924 . Bizarre!

However, earlyoom does not use VmRSS for decisions (except if you use --sort-by-rss). It uses oom_score (called badness here in the log, newest earlyoom calls it just oom_score).

I wonder if the mmap strangeness also affects oom_score. If it does, then Nheko may be an innocent victim.

rfjakob added a commit that referenced this issue Jun 6, 2024
@rfjakob
Copy link
Owner

rfjakob commented Jun 6, 2024

So I wrote a toy program to find out how this works ( c759f1b ).

Findings:

  1. Both VmRSS and oom_score are affected by mmap'ed-and-accessed memory
  2. /proc/meminfo's MemAvailable is NOT affected, the mmap'ed memory is counted into Cached.
  3. Once under memory pressure from another process, both VmRSS and oom_score melt away

This seems like a quite fundamental issue that will cause unnecessary crashes for apps that use memory mapping extensively.

I agree. Looks like:

  1. earlyoom will NOT trigger too early (because it's based onMemAvailable), but
  2. It may kill the wrong process, because that's based on oom_score

@rfjakob rfjakob reopened this Jun 6, 2024
@rfjakob rfjakob added the bug label Jun 6, 2024
@rfjakob
Copy link
Owner

rfjakob commented Jun 6, 2024

Reopening the ticket. It's now for the bug of "killing the wrong process, just because it had mmaps".

@rfjakob
Copy link
Owner

rfjakob commented Jul 9, 2024

I think --sort-by-rss should be changed to use RssAnon instead of VmRss.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants