Ok, so cjdns just crashed on you and printed some shit like
Fatal error: [Out of memory, limit exceeded]
what do you do.
The bdfl will fix it when it happens on his laptop.
Before crashing, cjdns prints a tree containing every memory allocation, its allocator, the parent of that allocator and so on back up to the root allocator.
Memory in cjdns is allocated in a tree structure similar to the directory structure on a filesystem. In order to allocate memory, you need an allocator, allocators can spawn child allocators and allocate memory. When an allocator is freed, all of its memory and all of the memory of its children is freed in turn.
If you want to see the memory tree while cjdns is running, you can trigger a print of the tree using
the RPC call Allocator_snapshot()
, the parameter specifies whether the individual memory
allocations should be shown or just the allocators, if cjdns has an OOM crash, it shows everything.
Example: ./tools/cexec 'Allocator_snapshot(1)'
Note that this will print the tree to stdout because it is far too large to return in a UDP packet
and sending it in multiple requests would require taking and storing (in memory) a snapshot to
be retrieved by further requests (patches for this would be appreciated).
The memory tree will look something like the following:
Core.c:496 [119096] bytes
Ducttape.c:982 [1088] bytes
| SessionManager.c:242 [864] bytes at [0x5555560579e0]
| SessionManager.c:241 [64] bytes at [0x555555fe0690]
| Ducttape.c:982 [160] bytes at [0x555555fe0f10]
Timeout.c:84 [488] bytes
| Allocator.c:657 [104] bytes at [0x555555e429e0]
| Timeout.c:85 [224] bytes at [0x555555e8cc50]
| Timeout.c:84 [160] bytes at [0x555555e7c460]
Janitor.c:671 [360] bytes
| Pinger.c:106 [1296] bytes
| | RouterModule.c:412 [2976] bytes
| | | UDPAddrInterface.c:96 [64] bytes at [0x55555605b710]
| | | UDPAddrInterface.c:96 [64] bytes at [0x55555605b670]
| | | Dict.c:132 [72] bytes at [0x555556058360]
| | | Dict.c:150 [64] bytes at [0x555556058310]
| | | Dict.c:132 [72] bytes at [0x5555560582c0]
| | | Dict.c:166 [64] bytes at [0x555556058270]
| | | Dict.c:132 [72] bytes at [0x555556058220]
| | | Dict.c:150 [64] bytes at [0x5555560581d0]
| | | RouterModule.c:414 [104] bytes at [0x555556058160]
| | | Message.h:58 [80] bytes at [0x555556058100]
| | | Message.h:57 [2096] bytes at [0x55555605aca0]
| | | RouterModule.c:412 [160] bytes at [0x555556058050]
| | Timeout.c:84 [488] bytes
| | | Allocator.c:657 [104] bytes at [0x555555ff5040]
| | | Timeout.c:85 [224] bytes at [0x555555fb2d00]
| | | Timeout.c:84 [160] bytes at [0x555555ff4f90]
In this simplified example, Core.c:496 is the root allocator, the grand-daddy of all memory
allocations. The RPC call Core_exit()
actually works by freeing this allocator. Everything
Janitor.c:84
is a child of the root allocator and Pinger.c:106
a child of that.
UDPAddrInterface.c:96 [64] bytes at [0x55555605b710]
is an allocation of a block of memory
which you can see from its size (64 bytes) and its memory address.
In each case, the allocator and the memory block, the filename and line number (eg: Core.c:496
)
is a place in a file where one can find the call to the memory operation.
Usually either Allocator_child()
or Allocator_malloc()
.
If you encounter an OOM crash, there is probably a leak. The easiest way to find this is to take
the entire memory tree trace (probably huge) and copy it into a text document. Remove everything
below ----- end cjdns memory snapshot -----
and everything above
----- cjdns memory snapshot -----
then save the file (perhaps using the name ~/cjdns_memdump
).
On the command line, run the following command:
cat ~/cjdns_memdump | sed -n -e 's/.* \([a-zA-Z0-9_]*\.[ch]\:[0-9]*\) .*$/\1/p' | sort | uniq -c | sort -n
This will show a list of each location (in the code) where memory is allocated and the number of currently active allocations (or allocators) which were allocated in that spot.
Here is the bottom of the output from the above command. The top is mostly locations which only have one currently active allocation.
8 RumorMill.c:136
57 Message.h:57
57 Message.h:58
57 Pinger.c:116
57 RouterModule.c:414
57 RouterModule.c:580
66 Dict.c:199
114 Pinger.c:106
114 RouterModule.c:412
114 UDPAddrInterface.c:96
118 NodeStore.c:1461
121 EncodingScheme.c:377
121 EncodingScheme.c:383
172 Dict.c:95
180 Dict.c:150
204 SessionManager.c:183
206 CryptoAuth.c:1207
236 NodeStore.c:1460
249 String.c:31
249 String.c:39
293 Dict.c:166
408 SessionManager.c:172
539 Dict.c:132
725 NodeStore.c:113
2392 Timeout.c:85
2470 Allocator.c:657
4784 Timeout.c:84
7428 SessionManager.c:241
7428 SessionManager.c:242
14856 Ducttape.c:982
All the way at the bottom is an anomaly, a single location in the source code which contains twice as many allocations as the next greatest number. Investigating that I found a dumb mistake which some idiot made while working on the source.