-
Notifications
You must be signed in to change notification settings - Fork 431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ksymbols): reimplement ksymbols #4464
Conversation
165e3d5
to
8eabeb9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work overall, though I do have some comments in mind.
a37c7cd
to
f3e5990
Compare
e5c5324
to
eaa12ab
Compare
I added an additional memory optimization - kernel symbols now only store the lower 48 bits of the address with the assumption that all addresses begin with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have one critical request to make: please avoid the mixed mutex transactions in the kernel symbols table. From experience this tends to cause transaction mixes where one write operation will happen in the middle of a mixed operation for example. Imagine the following methods m1 and m2 where m1 is w and m2 has rw. The following could occur:
- m2 - r
- m1 - wait for w
- m2 - release r
- m1 - back for w
- m1 - release
- m2 - back to w, wiht r assumptions changed due to m1.
Please either pick for each method that it is R or W and make the lock last for the whole operation. You could even opt for a regular mutex instead of a RWMutex, I don't think we have very frequent reads or writes to this struct anyway.
Symbols lookups could be very frequent in the future (stack trace processing). Using a write lock for the entire duration of In both functions, the write operations only add new data, they don't change or remove existing data. In the case of I could solve the issue with |
I could also change the API of the kernel symbol table so that reading It also solves a race condition where if a lookup happens between |
@geyslan some of your remaining comments are regarding lock behavior in WDYT? @NDStrahilevitz @yanivagman it would be great if you could weigh in as well |
@oshaked1 I worry that jt would expose an easy attack against tracee, simply load and unload a module many times. of course this is already suspicious behavior, and not the only "smoke screen" tactic possible, but adding another one isn't great. That said considering it is not the only such tactic maybe it shouldn't be a blocker for the option. a way to circumvent this altogether would be if you could cache the delta per module load. if we know for sure that the module in a load loop is the same one, there's no need to calculate the differences each time. although this might be too memory heavy. Overall no strong opinion on going that way, going RO has many obvious benefits, but the exploit surface slightly worries me. |
This attack is also a problem with the current implementation, because we clear the underlying symbol table and add the symbols from |
True, you're right. So, on my end, I see no issue with moving to an RO implementation. |
Me neither. 👍🏼 |
ce508b3
to
5008a8b
Compare
@NDStrahilevitz @geyslan I changed |
cd5b576
to
47d48e1
Compare
pkg/ebpf/tracee.go
Outdated
@@ -123,6 +122,9 @@ type Tracee struct { | |||
policyManager *policy.Manager | |||
// The dependencies of events used by Tracee | |||
eventsDependencies *dependencies.Manager | |||
// A reference to a environment.KernelSymbolTable that might change at runtime. | |||
// This should only be accessed using t.getKernelSymbols() and t.setKernelSymbols() | |||
kernelSymbols unsafe.Pointer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why using unsafe.Pointer and not *environment.KernelSymbolTable
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just to ensure that it's only accessed using the safe getter and setter.
pkg/ebpf/tracee.go
Outdated
func (t *Tracee) getKernelSymbols() *environment.KernelSymbolTable { | ||
return (*environment.KernelSymbolTable)(atomic.LoadPointer(&t.kernelSymbols)) | ||
} | ||
|
||
func (t *Tracee) setKernelSymbols(kernelSymbols *environment.KernelSymbolTable) { | ||
atomic.StorePointer(&t.kernelSymbols, unsafe.Pointer(kernelSymbols)) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm don't understand why we need those... What does the atomic protect from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yanivagman The atomic operations protect from simultaneous reads of t.kernelSymbols
and an update that replaces it. While there is probably no actual risk, go defines such simultaneous access as a data race so it's just to be safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we place this file under environment? I don't think this logic will have any use out of ksymbols context, isn't that so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I plan to use it in the future for ELF symbols
This implementation stores all symbols, or if a `requiredDataSymbolsOnly` flag is used when creating the symbol table, only non-data symbols are saved (and required data symbols must be registered before updating). This new implementation uses a generic symbol table implementation that is responsible for managing symbol lookups, and can be used by future code for managing exeutable file symbols.
After running the init function of a kernel module, the kernel frees the memory that was allocated for it but doesn't remove its symbol from kallsyms. This resulsts in a scenario where a subsequent loaded module can be allocated to the same area as the free'd init function of the prevous module. This could result in 2 symbols at the same address, one is the free'd init function and another from the newly loaded module. This caused an undeterminism in which symbol is used by the hooked_syscall event, which only used the first symbol that was found, resulting in random test failures. This commit changes the hooked_syscall event to emit one event for each found symbol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM after all the changes, nice work.
The thread stack area was extracted by finding the VMA containing the SP of the new thread, but because the SP might be just past the end of its allocated VMA (top of the stack), sometimes the correct VMA was not found.
@yanivagman I still need to check RSS of this. |
On my system, the memory usage increased from |
I'm merging this since we've already discussed regarding the limit of RSS increase, and as explained above it's below. |
/fast-forward |
1. Explain what the PR does
The previous ksymbols implementation used a lazy lookup method, where only symbols marked as required ahead of time were stored. Trying to lookup a symbol that was not stored resulted in
/proc/kallsyms
being read and parsed in its entirety.While most symbols being looked up were registered as required ahead of time, some weren't (in particular symbols needed for kprobe attachment) which incurred significant overhead when tracee is being initialized.
This new implementation stores all symbols, or if a
requiredDataSymbolsOnly
flag is used when creating the symbol table (used by default), only non-data symbols are stored (and required data symbols must be registered before updating). Some additional memory usage optimizations are included, for example encoding symbol owners as an index into a list of owner names, and also lazy symbol name lookups where the map of symbol name to symbol is populated only for symbols that were looked up once.From measurements I performed, the extra memory consumption is around 21MB (from ~159MB to ~180MB when running tracee with no arguments on my machine).
Under the hood, this ksymbols implementation uses a generic symbol table implementation that can be used by future code for managing executable file symbols.
A significant advantage gained by storing all non-data symbols is the ability to lookup a function symbol that contains a given code address, a feature that I plan to use in the future.
This PR closes #4463 and renders #4325 irrelevant (because
/proc/kallsyms
reads no-longer happen "spontaneously").