feat(ksymbols): reimplement ksymbols #4464

oshaked1 · 2024-12-25T16:41:54Z

1. Explain what the PR does

The previous ksymbols implementation used a lazy lookup method, where only symbols marked as required ahead of time were stored. Trying to lookup a symbol that was not stored resulted in /proc/kallsyms being read and parsed in its entirety.
While most symbols being looked up were registered as required ahead of time, some weren't (in particular symbols needed for kprobe attachment) which incurred significant overhead when tracee is being initialized.

This new implementation stores all symbols, or if a requiredDataSymbolsOnly flag is used when creating the symbol table (used by default), only non-data symbols are stored (and required data symbols must be registered before updating). Some additional memory usage optimizations are included, for example encoding symbol owners as an index into a list of owner names, and also lazy symbol name lookups where the map of symbol name to symbol is populated only for symbols that were looked up once.

From measurements I performed, the extra memory consumption is around 21MB (from ~159MB to ~180MB when running tracee with no arguments on my machine).

Under the hood, this ksymbols implementation uses a generic symbol table implementation that can be used by future code for managing executable file symbols.

A significant advantage gained by storing all non-data symbols is the ability to lookup a function symbol that contains a given code address, a feature that I plan to use in the future.

This PR closes #4463 and renders #4325 irrelevant (because /proc/kallsyms reads no-longer happen "spontaneously").

NDStrahilevitz

Nice work overall, though I do have some comments in mind.

pkg/utils/symbol_table.go

pkg/utils/environment/kernel_symbols.go

pkg/ebpf/tracee.go

oshaked1 · 2025-01-01T15:50:37Z

I added an additional memory optimization - kernel symbols now only store the lower 48 bits of the address with the assumption that all addresses begin with 0xffff. We ignore any symbols whose address doesn't start with 0xffff, which is only percpu symbols. This allows us to encode the address and owner index together which eliminates 8 bytes per symbol for a total memory saving of around 3-4MB.

NDStrahilevitz

I have one critical request to make: please avoid the mixed mutex transactions in the kernel symbols table. From experience this tends to cause transaction mixes where one write operation will happen in the middle of a mixed operation for example. Imagine the following methods m1 and m2 where m1 is w and m2 has rw. The following could occur:

m2 - r
m1 - wait for w
m2 - release r
m1 - back for w
m1 - release
m2 - back to w, wiht r assumptions changed due to m1.

Please either pick for each method that it is R or W and make the lock last for the whole operation. You could even opt for a regular mutex instead of a RWMutex, I don't think we have very frequent reads or writes to this struct anyway.

oshaked1 · 2025-01-02T13:20:31Z

Symbols lookups could be very frequent in the future (stack trace processing). Using a write lock for the entire duration of KernelSymbolTable.UpdateFromReader will prevent symbol lookups for a significant duration. The same applies to SymbolTable.LookupByName.

In both functions, the write operations only add new data, they don't change or remove existing data. In the case of SymbolTable.LookupByName, the worst case scenario for an outdated assumption means we add the same name to symbol mapping twice (the added data will always be the same). For KernelSymbolTable.UpdateFromReader, the worst case scenario is the same owner gets added twice to idxToSymbolOwner, but all data remains valid.

I could solve the issue with UpdateFromReader by adding a third lock that makes sure only a single update operation can happen at a time (which prevents 2 concurrent goroutines from wanting to add a new symbol owner).

oshaked1 · 2025-01-02T13:23:55Z

I could also change the API of the kernel symbol table so that reading /proc/kallsyms happens once when creating the symbol table, and if the user wants to update it, he must create a new one. This prevents the need for locks at all.

It also solves a race condition where if a lookup happens between kst.symbols.Clear() and kst.symbols.AddSymbols(symbols) the lookup will fail.

oshaked1 · 2025-01-13T13:54:34Z

@geyslan some of your remaining comments are regarding lock behavior in kernel_symbols.go. As I proposed in a previous comment, I could change the API such that KernelSymbolTable is RO and when it needs to be updated (currently only happens when a kernel module is loaded) a new one is created instead. This would completely remove the need for locks in this file.

WDYT? @NDStrahilevitz @yanivagman it would be great if you could weigh in as well

NDStrahilevitz · 2025-01-13T13:59:14Z

@oshaked1 I worry that jt would expose an easy attack against tracee, simply load and unload a module many times. of course this is already suspicious behavior, and not the only "smoke screen" tactic possible, but adding another one isn't great. That said considering it is not the only such tactic maybe it shouldn't be a blocker for the option.

a way to circumvent this altogether would be if you could cache the delta per module load. if we know for sure that the module in a load loop is the same one, there's no need to calculate the differences each time. although this might be too memory heavy.

Overall no strong opinion on going that way, going RO has many obvious benefits, but the exploit surface slightly worries me.

oshaked1 · 2025-01-13T14:14:13Z

This attack is also a problem with the current implementation, because we clear the underlying symbol table and add the symbols from /proc/kallsyms again. The RO implementation is actually more resistant to this attack becuase lookups can still happen normally while a new kernel symbol table is being created, and only when it's ready t.kernelSymbols will be replaced atomically.

NDStrahilevitz · 2025-01-13T14:17:54Z

This attack is also a problem with the current implementation, because we clear the underlying symbol table and add the symbols from /proc/kallsyms again. The RO implementation is actually more resistant to this attack becuase lookups can still happen normally while a new kernel symbol table is being created, and only when it's ready t.kernelSymbols will be replaced atomically.

True, you're right. So, on my end, I see no issue with moving to an RO implementation.

geyslan · 2025-01-13T14:30:10Z

This attack is also a problem with the current implementation, because we clear the underlying symbol table and add the symbols from /proc/kallsyms again. The RO implementation is actually more resistant to this attack becuase lookups can still happen normally while a new kernel symbol table is being created, and only when it's ready t.kernelSymbols will be replaced atomically.

True, you're right. So, on my end, I see no issue with moving to an RO implementation.

Me neither. 👍🏼

oshaked1 · 2025-01-13T16:30:13Z

@NDStrahilevitz @geyslan I changed KernelSymbolTable to RO and guarded t.kernelSymbols behind an atomic getter and setter.

yanivagman · 2025-01-15T15:51:08Z

pkg/ebpf/tracee.go

@@ -123,6 +122,9 @@ type Tracee struct {
 	policyManager *policy.Manager
 	// The dependencies of events used by Tracee
 	eventsDependencies *dependencies.Manager
+	// A reference to a environment.KernelSymbolTable that might change at runtime.
+	// This should only be accessed using t.getKernelSymbols() and t.setKernelSymbols()
+	kernelSymbols unsafe.Pointer


Why using unsafe.Pointer and not *environment.KernelSymbolTable?

It's just to ensure that it's only accessed using the safe getter and setter.

yanivagman · 2025-01-15T15:52:24Z

pkg/ebpf/tracee.go

+func (t *Tracee) getKernelSymbols() *environment.KernelSymbolTable {
+	return (*environment.KernelSymbolTable)(atomic.LoadPointer(&t.kernelSymbols))
+}
+
+func (t *Tracee) setKernelSymbols(kernelSymbols *environment.KernelSymbolTable) {
+	atomic.StorePointer(&t.kernelSymbols, unsafe.Pointer(kernelSymbols))
+}
+


I'm don't understand why we need those... What does the atomic protect from?

@yanivagman The atomic operations protect from simultaneous reads of t.kernelSymbols and an update that replaces it. While there is probably no actual risk, go defines such simultaneous access as a data race so it's just to be safe.

yanivagman · 2025-01-15T16:09:25Z

pkg/utils/symbol_table.go

Shouldn't we place this file under environment? I don't think this logic will have any use out of ksymbols context, isn't that so?

I plan to use it in the future for ELF symbols

This implementation stores all symbols, or if a `requiredDataSymbolsOnly` flag is used when creating the symbol table, only non-data symbols are saved (and required data symbols must be registered before updating). This new implementation uses a generic symbol table implementation that is responsible for managing symbol lookups, and can be used by future code for managing exeutable file symbols.

After running the init function of a kernel module, the kernel frees the memory that was allocated for it but doesn't remove its symbol from kallsyms. This resulsts in a scenario where a subsequent loaded module can be allocated to the same area as the free'd init function of the prevous module. This could result in 2 symbols at the same address, one is the free'd init function and another from the newly loaded module. This caused an undeterminism in which symbol is used by the hooked_syscall event, which only used the first symbol that was found, resulting in random test failures. This commit changes the hooked_syscall event to emit one event for each found symbol.

NDStrahilevitz

LGTM after all the changes, nice work.

The thread stack area was extracted by finding the VMA containing the SP of the new thread, but because the SP might be just past the end of its allocated VMA (top of the stack), sometimes the correct VMA was not found.

geyslan · 2025-01-20T15:11:17Z

@yanivagman I still need to check RSS of this.

geyslan · 2025-01-20T16:06:41Z

@yanivagman I still need to check RSS of this.

grep -E ' [tT] ' /proc/kallsyms | wc -l
174972

On my system, the memory usage increased from 32.3MB (main) to 62.2MB (PR) at startup. During oscillation, it reached lows of 22MB (main) and 42MB (PR). Initially, the PR showed an increase of approximately 30MB, but it stabilized to around ~20MB in the long run.

geyslan · 2025-01-22T19:27:30Z

I'm merging this since we've already discussed regarding the limit of RSS increase, and as explained above it's below.

geyslan · 2025-01-22T19:27:38Z

/fast-forward

github-actions bot assigned oshaked1 Dec 25, 2024

github-actions bot added area/ebpf area/testing area/events labels Dec 25, 2024

oshaked1 force-pushed the kallsyms branch 10 times, most recently from 165e3d5 to 8eabeb9 Compare December 26, 2024 14:36

NDStrahilevitz reviewed Dec 26, 2024

View reviewed changes

oshaked1 force-pushed the kallsyms branch 7 times, most recently from a37c7cd to f3e5990 Compare December 29, 2024 10:58

yanivagman linked an issue Dec 29, 2024 that may be closed by this pull request

Kernel symbols are updated without required capabilities #4325

Closed

oshaked1 force-pushed the kallsyms branch 2 times, most recently from e5c5324 to eaa12ab Compare January 1, 2025 15:41

oshaked1 force-pushed the kallsyms branch from eaa12ab to 3ba819b Compare January 1, 2025 16:47

NDStrahilevitz requested changes Jan 2, 2025

View reviewed changes

oshaked1 force-pushed the kallsyms branch from a346008 to 31b26bc Compare January 13, 2025 13:23

oshaked1 force-pushed the kallsyms branch 2 times, most recently from ce508b3 to 5008a8b Compare January 13, 2025 16:28

oshaked1 force-pushed the kallsyms branch 4 times, most recently from cd5b576 to 47d48e1 Compare January 14, 2025 10:17

yanivagman requested a review from NDStrahilevitz January 15, 2025 15:42

yanivagman reviewed Jan 15, 2025

View reviewed changes

oshaked1 force-pushed the kallsyms branch from 47d48e1 to e056f23 Compare January 15, 2025 18:03

oshaked1 added 2 commits January 16, 2025 12:32

oshaked1 force-pushed the kallsyms branch from e056f23 to 9779b7b Compare January 16, 2025 10:52

fix(events): incorrect format string in error log

4bba9e4

NDStrahilevitz approved these changes Jan 16, 2025

View reviewed changes

fix(ebpf): fix incorrect tracking of thread stacks

5fc9a90

The thread stack area was extracted by finding the VMA containing the SP of the new thread, but because the SP might be just past the end of its allocated VMA (top of the stack), sometimes the correct VMA was not found.

geyslan added the kind/bug label Jan 20, 2025

geyslan requested a review from yanivagman January 20, 2025 16:06

This comment was marked as resolved.

Sign in to view

geyslan merged commit e113f04 into aquasecurity:main Jan 22, 2025
41 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ksymbols): reimplement ksymbols #4464

feat(ksymbols): reimplement ksymbols #4464

oshaked1 commented Dec 25, 2024 •

edited

Loading

NDStrahilevitz left a comment

oshaked1 commented Jan 1, 2025 •

edited

Loading

NDStrahilevitz left a comment

oshaked1 commented Jan 2, 2025

oshaked1 commented Jan 2, 2025

oshaked1 commented Jan 13, 2025 •

edited

Loading

NDStrahilevitz commented Jan 13, 2025

oshaked1 commented Jan 13, 2025

NDStrahilevitz commented Jan 13, 2025

geyslan commented Jan 13, 2025

oshaked1 commented Jan 13, 2025

yanivagman Jan 15, 2025

oshaked1 Jan 15, 2025 •

edited

Loading

yanivagman Jan 15, 2025

oshaked1 Jan 15, 2025

yanivagman Jan 15, 2025

oshaked1 Jan 15, 2025

NDStrahilevitz left a comment

geyslan commented Jan 20, 2025

geyslan commented Jan 20, 2025

geyslan commented Jan 22, 2025

geyslan commented Jan 22, 2025

This comment was marked as resolved.

feat(ksymbols): reimplement ksymbols #4464

feat(ksymbols): reimplement ksymbols #4464

Conversation

oshaked1 commented Dec 25, 2024 • edited Loading

1. Explain what the PR does

NDStrahilevitz left a comment

Choose a reason for hiding this comment

oshaked1 commented Jan 1, 2025 • edited Loading

NDStrahilevitz left a comment

Choose a reason for hiding this comment

oshaked1 commented Jan 2, 2025

oshaked1 commented Jan 2, 2025

oshaked1 commented Jan 13, 2025 • edited Loading

NDStrahilevitz commented Jan 13, 2025

oshaked1 commented Jan 13, 2025

NDStrahilevitz commented Jan 13, 2025

geyslan commented Jan 13, 2025

oshaked1 commented Jan 13, 2025

yanivagman Jan 15, 2025

Choose a reason for hiding this comment

oshaked1 Jan 15, 2025 • edited Loading

Choose a reason for hiding this comment

yanivagman Jan 15, 2025

Choose a reason for hiding this comment

oshaked1 Jan 15, 2025

Choose a reason for hiding this comment

yanivagman Jan 15, 2025

Choose a reason for hiding this comment

oshaked1 Jan 15, 2025

Choose a reason for hiding this comment

NDStrahilevitz left a comment

Choose a reason for hiding this comment

geyslan commented Jan 20, 2025

geyslan commented Jan 20, 2025

geyslan commented Jan 22, 2025

geyslan commented Jan 22, 2025

This comment was marked as resolved.

oshaked1 commented Dec 25, 2024 •

edited

Loading

oshaked1 commented Jan 1, 2025 •

edited

Loading

oshaked1 commented Jan 13, 2025 •

edited

Loading

oshaked1 Jan 15, 2025 •

edited

Loading