Large lingering mmaps that apparently could be paged out seem to instead cause large memory crashes #318

ell1e · 2024-06-06T12:52:21Z

I'm using a chat client called nheko which uses a lot of memory during the initial log in. However, usually later it uses less memory during normal use. Yet, after one of those log ins, many minutes later I got this out of memory "crash":

Jun  6 14:32:30 pinephone daemon.info earlyoom: mem avail:   203 of  2994 MiB ( 6.78%), swap free:  255 of 1023 MiB (24.95%)
Jun  6 14:32:30 pinephone daemon.info earlyoom: low memory! at or below SIGTERM limits: mem 10.00%, swap 25.00%
Jun  6 14:32:30 pinephone daemon.info earlyoom: sending SIGTERM to process 7440 uid 10000 "io.github.Nheko": badness 1036, VmRSS 1839 MiB
Jun  6 14:32:30 pinephone daemon.info earlyoom: process exited after 0.0 seconds

I asked the developer of Nheko what happened, and whether this is a Nheko memory leak. Especially since after restarting it used less memory again, only around 200mb which is about 10% of what it used when killed.

The nheko developer suggested the issue was a combination of 1. nheko memory maps a large amount of areas but then mostly doesn't use them anymore after that login phase, 2. those memory pages could then be paged out but the kernel doesn't seem to do that eagerly unless memory is tighter than the earlyoom trigger point, 3. earlyoom instead of just doing something to get those pages paged out will seemingly needlessly terminate nheko instead.

(I hope I summed all of that up correctly, my apologies if not.)

Is this fixable in earlyoom? This seems like a quite fundamental issue that will cause unnecessary crashes for apps that use memory mapping extensively.

Affected earlyoom version: earlyoom v1.7, as packaged by postmarketOS 23.12

The text was updated successfully, but these errors were encountered:

hakavlad · 2024-06-06T14:01:30Z

mem avail: 203 of 2994 MiB ( 6.78%), swap free: 255 of 1023 MiB (24.95%)

earlyoom just did what it had to do. You can make its settings different.

hakavlad · 2024-06-06T14:04:51Z

swap free: 255 of 1023 MiB (24.95%)

Also I'd like to recommend to increase swap space, maybe up to 3-6 GB, also maybe configure swap on ZRAM.

ell1e · 2024-06-06T14:11:01Z

Okay, so how should I configure it to avoid this? I can't make it trigger later because then the device will lock up. My apologies if I'm missing something.

From what I got from the previous conversations, this doesn't seem to be related to the trigger point chosen but seems to be kind of a problem with earlyoom apparently 1. being triggered by tons of mapped file pages that arent actually actively used (instead of legit used pages of allocated memory), 2. then earlyoom not defaulting to somehow trying to unmap file pages first instead of stopping a program (if that's even something earlyoom could do). So I'm not quite sure what I should change in my configuration to avoid running into this, it sounded to me like something that can't be handled in the config. Then again I really wouldn't know, I don't know that much about mapping files to memory.

hakavlad · 2024-06-06T14:16:17Z

What happens when earlyoom disabled?

hakavlad · 2024-06-06T14:20:29Z

mapped file pages that arent actually actively used

arent actually actively used != free

hakavlad · 2024-06-06T14:24:21Z

those memory pages could then be paged out but the kernel doesn't seem to do that

Seems like the kernel DO that:

swap free is 24.95%

ell1e · 2024-06-06T14:24:26Z

arent actually actively used != free

The Nheko dev said apparently the kernel is meant to swap them out on its own so earlyoom just shouldn't have triggered, if I understood the words correctly. I couldn't tell you who is right 😂 the problem seems to be triggered by LMDB mapping a ton of things when writing all over the database, and then not really unmapping it when it's actually no longer needed, and apparently just ignoring that would lead to correct behavior...? I wouldn't understand myself. What instead happens is that earlyoom triggers and things start going down.

rfjakob · 2024-06-06T14:29:25Z

Can you run pmap -x PID on the PID of the huge chat process?

…

On Thu, 6 Jun 2024, 16:24 Ellie, ***@***.***> wrote: arent actually actively used != free The Nheko dev said apparently the kernel is meant to swap them out on its own so earlyoom just shouldn't have triggered, if I understood the words correctly. I couldn't tell you who is right 😂 the problem seems to be triggered by LMDB <https://www.symas.com/lmdb> mapping a ton of things when writing all over the database, and then not really unmapping it when it's actually no longer needed, and apparently just ignoring that would lead to correct behavior...? I wouldn't understand myself. What instead happens is that earlyoom triggers and things start going down. — Reply to this email directly, view it on GitHub <#318 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACGA73H743QNQWA3WJ4HLTZGBWLDAVCNFSM6AAAAABI4VQE52VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJSGY3TKNBTGY> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

hakavlad · 2024-06-06T14:30:41Z

If earlyoom will not work, then in case of further leakage:

SwapFree and MemAvailable will be closed to 0, then the kernel OOM killer will be triggered.

Try simply increasing the swap space and decreasing earlyoom thresholds (for example to 4-6%)

ell1e · 2024-06-06T15:03:49Z

I've been trying to recreate the exact situation to use pmap but haven't managed, sadly. If I ever do I'll reopen. My apologies for the ticket spam, and I appreciate all your helpful responses!

rfjakob · 2024-06-06T15:16:35Z

Looks like VmRSS is in fact very strange with mmap mappings present. Look at this: htop-dev/htop#924 . Bizarre!

However, earlyoom does not use VmRSS for decisions (except if you use --sort-by-rss). It uses oom_score (called badness here in the log, newest earlyoom calls it just oom_score).

I wonder if the mmap strangeness also affects oom_score. If it does, then Nheko may be an innocent victim.

Relates-to: #318

rfjakob · 2024-06-06T19:26:10Z

So I wrote a toy program to find out how this works ( c759f1b ).

Findings:

Both VmRSS and oom_score are affected by mmap'ed-and-accessed memory
/proc/meminfo's MemAvailable is NOT affected, the mmap'ed memory is counted into Cached.
Once under memory pressure from another process, both VmRSS and oom_score melt away

This seems like a quite fundamental issue that will cause unnecessary crashes for apps that use memory mapping extensively.

I agree. Looks like:

earlyoom will NOT trigger too early (because it's based onMemAvailable), but
It may kill the wrong process, because that's based on oom_score

rfjakob · 2024-06-06T19:27:33Z

Reopening the ticket. It's now for the bug of "killing the wrong process, just because it had mmaps".

rfjakob · 2024-07-09T11:34:38Z

I think --sort-by-rss should be changed to use RssAnon instead of VmRss.

ell1e closed this as not planned Won't fix, can't repro, duplicate, stale Jun 6, 2024

rfjakob added a commit that referenced this issue Jun 6, 2024

contrib: add mmap_test

c759f1b

Relates-to: #318

rfjakob reopened this Jun 6, 2024

rfjakob added the bug label Jun 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large lingering mmaps that apparently could be paged out seem to instead cause large memory crashes #318

Large lingering mmaps that apparently could be paged out seem to instead cause large memory crashes #318

ell1e commented Jun 6, 2024

hakavlad commented Jun 6, 2024

hakavlad commented Jun 6, 2024 •

edited

Loading

ell1e commented Jun 6, 2024 •

edited

Loading

hakavlad commented Jun 6, 2024

hakavlad commented Jun 6, 2024

hakavlad commented Jun 6, 2024

ell1e commented Jun 6, 2024

rfjakob commented Jun 6, 2024 via email

hakavlad commented Jun 6, 2024

ell1e commented Jun 6, 2024

rfjakob commented Jun 6, 2024

rfjakob commented Jun 6, 2024

rfjakob commented Jun 6, 2024

rfjakob commented Jul 9, 2024

Large lingering mmaps that apparently could be paged out seem to instead cause large memory crashes #318

Large lingering mmaps that apparently could be paged out seem to instead cause large memory crashes #318

Comments

ell1e commented Jun 6, 2024

hakavlad commented Jun 6, 2024

hakavlad commented Jun 6, 2024 • edited Loading

ell1e commented Jun 6, 2024 • edited Loading

hakavlad commented Jun 6, 2024

hakavlad commented Jun 6, 2024

hakavlad commented Jun 6, 2024

ell1e commented Jun 6, 2024

rfjakob commented Jun 6, 2024 via email

hakavlad commented Jun 6, 2024

ell1e commented Jun 6, 2024

rfjakob commented Jun 6, 2024

rfjakob commented Jun 6, 2024

rfjakob commented Jun 6, 2024

rfjakob commented Jul 9, 2024

hakavlad commented Jun 6, 2024 •

edited

Loading

ell1e commented Jun 6, 2024 •

edited

Loading