Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting Linux symbol tables converted from Vol2 or made for kernels without full vmlinux available #1291

Open
atcuno opened this issue Oct 4, 2024 · 16 comments
Assignees

Comments

@atcuno
Copy link
Contributor

atcuno commented Oct 4, 2024

@ikelos we need to make a decision and quickly/soon implement how we are going to support symbol tables created from converted vol2 profiles and from kernels where full debug vmlinux files are not available (both of these trigger the same issue). This will be a major blocker for our push to get people off Volatility 2 otherwise as not every kernel has a debug vmlinux public and people have 10+ years of vol2 profiles made for in-house kernels where they won't have kept the vmlinux.

This issue manifests in calls to object_from_symbol and users generally notice it right away as this call to instantiate init_task in linux.pslist leads to a backtrace when the list is attempted to be walked:

https://github.com/volatilityfoundation/volatility3/blob/develop/volatility3/framework/plugins/linux/pslist.py#L211

In these symbol tables without full debug kernels, object_from_symbol cannot create the object as the symbol table looks like this:

"init_task": {
      "type": {
        "kind": "base",
        "name": "void"
      },
      "address": 18446744071601975000
    }

And since the type is void, obviously the backend doesn't know its real type. For symbol tables made without the debug info, dwarf2json can't magically figure out the type.

To fix this, the cleanest way I can envision is to have an optional parameter to object_from_symbol where the caller can set a backup type in cases of these symbol tables. The backend API can then check the symbol table (ISF) for the type, and if a symbol's type is void, then it can look at the optional parameter for the real type. In this pslist case, sending in "task_struct" through this flow would get the object instantiated as the same type as when a debug kernel is available to build the ISF.

This will also let the fix be input at initial development type of future plugins.

Does this way work for you? Do you want to take a shot at the first draft of implementing the updated object_from_symbol API? If not this way, then please suggest another that is feasible in a quick timeframe as this is preventing us from performing the mass testing we need to do to get vol3 into cut over state from vol2.

@gcmoreira
Copy link
Contributor

This is precisely the case with the current ISF for the linux-sample-1.bin sample in the github actions. See this.

It might be worth considering accepting a tuple as well.. see this line.
The same way the builtin function works, for instance: isinstance(), str.startswith(), str.endsswith(), etc.

@ikelos
Copy link
Member

ikelos commented Oct 4, 2024

This fundamentally breaks assumptions about how volatility 3 was designed, and leads us away from being a forensic tool towards being a "best guess" tool. Trying to find a way to do this in a short time-frame is infeasible and applies an unnecessary pressure, given it's been a well known design decision for 10+ years of volatility 3. Memory forensics is a difficult task, and there is no obligation on the project to support every situation. It's the same reason we don't support process dumps or partial memory dumps.

I'm happy to consider ideas, but for most of them I'd want a way to very clearly point out that an incomplete symbol table was provided and therefore the results are a) liable to fail for certain plugins (not necessarily core plugin, but ones anyone's written) and b) were generated by hand and may be wildly inaccurate. A lot of our plugins use knowledge of other structures to determine what version they are and therefore how to read information. That goes away if we're guessing at structures.

If we can identify known types with "what we assume they should be" subtypes, I'd far sooner have a separate tool run over the symbol table and fix it up, than littering volatility code with "best guess" structures. That ensures that the data-in (symbol table and memory dump) are still very clearly external to the forensics that the tool does. If someone wants a "what if" tool where they can dynamically decide what structures a piece of data might contain, I have no problem with them building one using volatility as a library, but I strongly disagree that it should be built into volatility by default.

I'm still not entirely happy with even that idea, it would require a clear way of differentiating between custom generated ISF and those we produce/were produced from complete information. Since we're an open source we don't really have a way of signing them with a secret or performing any kind of validity on the symbol tables people have produced or that we consume. At the moment it's held together by a somewhat complicated means of identifying the tables which requires actually understanding what's going on to produce working tables. At least we can say "if the input's bad then the output might be bad", if we start building guesses or assumptions into the tool then it becomes difficult to tell whether it was bad data fed in, or we made a poor assumption.

@atcuno
Copy link
Contributor Author

atcuno commented Oct 4, 2024

A couple things:

  1. We cannot cut over from vol2 without this support. There are simply too many kernels from distros now and in the past that do not have public vmlinux debug symbols. Same for users of vol2 over the last decade+ that have made vol2 profiles for in-house kernels and that they don't have the vmlinux anymore. This is literally an issue where if we don't support it then we will drop support for Linux in Vol3 entirely as we will never get people switching from Vol2 and we are all wasting our time. This is not me being dramatic this is just how it is. So we don't have a choice but to support it, so we just need to decide how and quickly.

  2. The quickest place to show how it works and come up with a solution is in linux.pslist. For every kernel version that we support (going back to 2008), the init_task symbol that I showed in the original ticket is of type task_struct. So by not letting the plugin specify this type, we are breaking listing processes even though with know with 100% certainty what the type actually is.

Looking at Gus' PR here: https://github.com/volatilityfoundation/volatility3/pull/1283/files#diff-e5a8a9af6057f1e6e2579f04f7ec5567f6fc9f7e32f74175332b280237964336R73

We also know the type of what those will be, but again the symbol table without the symbol types makes it where he has to put in an ugly hack. The same flow of his code (check symbol table for type, fall back to plugin specified one) needs to be generalized for plugins to access. So we just need to design a change so the a fallback type can be specified by plugins in the call.

  1. grepping across the code base for instances of this, there are 26 calls to object_from_symbol in Linux code and off the top of my head (without having to check kernel source) I know that at least 10 of them are like init_task where the type has never changed in kernels, and again we are losing the plugin working from not specifying the type to fallback to.

@Abyss-W4tcher
Copy link
Contributor

Abyss-W4tcher commented Oct 4, 2024

Hi, would it be possible to acquire those "custom" made symbols tables, to investigate if a solution to directly patch the ISF before feeding it to volatility3 can be possible ? Also, were they made based on a module.ko + System.map, or with a custom tool that translates a vol2 profile in a vol3 ISF ?

Thanks

edit : As I suppose that the "broken" ISF still contains the task_struct definition, making an external tool patching init_task type (and other needed symbols) directly in the ISF might be a solution to consider.

Someone only possessing a Vol2 profile as a last resort (kernel source is gone), will necessarily need to convert it to Vol3 with some custom dwarf2json anyway, as the Linux banner will need to be manually specified too ?

@atcuno
Copy link
Contributor Author

atcuno commented Oct 4, 2024

Yes, the symbol tables have the definition of the type, but the symbol's type is missing - see init_task as void in original ticket header as it illustrates the problem. For almost all of these - init_task, the ones I linked to in Gus' PR, and quite a few others, we know the type with 100% certainty as it has never changed since the beginning of the kernels we support (year 2008) - so by encoding a fall back type, we bring these plugins and APIs back to life even with the limited symbol tables. I also don't consider them wild or "custom" symbol tables since the main producer

If that is truly uncomfortable for people, we can vollog.warning (always forced to output) when a type is being fell back to, but in reality it is not something that is going to cause problems.

We also cannot force this on the producer of the ISF as then we are recreating the problems we had with module.c where people had to rebuild profiles (now symbol tables) often as the types we need change. Instead, Volatility - which knows exactly the type it needs and the types name - should do it right in the call to object_from_symbol. This prevents users from ever having to rebuild their symbol tables to get support for new or updated plugins.

We also have ISFs produced from older versions of dwarf2json that we published as ready to use, and that people built many symbol tables from, that do not produce ISFs with the correct types and we cannot expect or force users to migrate their entire data set each time we update versions or fix a dwarf2json bug. We also cannot expect them to keep the debug kernels around forever -- and again, all this is fixed by putting in the fallback types in the Volatility plugins that use the symbols.

When it comes time for cutover and we are pushing for our user base to switch, we will get one chance with users who rely on Volatility for critical real-life investigations. If they run Vol3 and samples aren't supported that Vol2 supports, then they will not come back to Vol3 anytime soon, if ever.

@atcuno
Copy link
Contributor Author

atcuno commented Oct 5, 2024

To make this more tangible, Gus is going to make a PR against this ticket that shows object_from_symbol taking an optional parameter, falling back to it if the ISF doesn't have the type for the symbol, and producing a vollog.warning() if the fallback is used. The warning will instruct people to generate a new ISF if they still have access to the vmlinux. Does this sound like an acceptable path @ikelos @Abyss-W4tcher?

@Abyss-W4tcher
Copy link
Contributor

Abyss-W4tcher commented Oct 5, 2024

From my point of view, I think this can fit for core kernel structures, which haven't and won't likely change in many years. However, I totally understand ikelos concerns, that a lot of people might come and raise issues, because Vol3 will be able to run pslist with their "Vol2"/"old dwarf2json" ISFs but not the latest brand new plugins, as it will for example miss the 0.7.0 dwarf2json fix for anonymous structs.

I guess you should test a "vol2 ISF" with the experimental object_from_symbol patch applied everywhere it's needed, and see if plugins run smoothly, to clearly identify where this could go wrong ?

Of course, this is just my opinion as a contributor 👍

@atcuno
Copy link
Contributor Author

atcuno commented Oct 5, 2024

Just to clarify: the data structure definitions are in the ISF. What is missing is the type information for the global symbols, such as this from the ticket header:

"init_task": {
      "type": {
        "kind": "base",
        "name": "void"
      },
      "address": 18446744071601975000
    }

So when pslist does object_from_symbol on init_task, the backend sees the type as void. We obviously know the type is really task_struct as it hasn't changed since at least 2008. So by letting the plugin send in task_struct as the fallback type name, we will still let the object_from_symbol succeed and the plugin works.

For symbols whose type changes over time, we can keep object_from_symbol as-is, without using any fallback type, and then check the return value in the plugin. It is a bug already that so many plguins don't check this return value. In failure cases when we don't know 100% know the type, we can/will then warn the user to generate a a new ISF as we cannot safely proceed. I feel like between this approach and vollog.warning() when fallback is used, we are 1) 100% safe as we only fallback when we 100% know the type 2) significantly warn the user to update their ISF if possible (the vmlinux still around).

@atcuno
Copy link
Contributor Author

atcuno commented Oct 5, 2024

Second clarification: This is NOT just an issue of people converting vol2 profiles, it is also a problem WE created by publishing a dwarf2json for a very long time that produced broken type information. We should NOT put it back on users who made 100s or 1000s of ISFs already to then have to re-generate them when we can simply fix 99% of the problem with a proper object_from_symbol update.

@Abyss-W4tcher
Copy link
Contributor

Yes I completely understand the issue with the void definition, it's just to ensure that all plugins (not only pslist) will still work with incomplete ISFs and this additional feature (if it gets integrated), to ensure smooth transition for new users 👍.

@atcuno
Copy link
Contributor Author

atcuno commented Oct 5, 2024

Yes, once we have this issue merged into develop + the few remaining plugins for parity merged, we will be running every plugin for vol3 against 200-300 Linux samples to discover and verify any issues or bugs we missed so far. This will definitely uncover any problems like you said. Gus and I will be leading this effort. The infrastructure is already in place, we are just waiting for vol3 to be ready.

@ikelos
Copy link
Member

ikelos commented Oct 5, 2024

To make this more tangible

Tangibility is not the issue. I understand the problem you're trying to "solve" and I've given my reasons against it and explained an alternative. Spending time trying to convince me is basically ignoring my reply. Changing our tool to support a different broken tool, would be like attempting to add support for lime's raw format. The input is bad, there's no way to tell the difference and trying to add a fix to our tool will make things infinitely more complicated and lead to many, many bogus bug reports. A separate tool that can attempt to repair bad inputs files, only needs to run once, is very clearly either applied to an ISF or not and does NOT potentially change the working of an otherwise accurate tool is a much better solution.

So when pslist does object_from_symbol on init_task, the backend sees the type as void. We obviously know the type is really task_struct as it hasn't changed since at least 2008. So by letting the plugin send in task_struct as the fallback type name, we will still let the object_from_symbol succeed and the plugin works.

If we obviously know, then write a tool that changes the ISF it hands in. Don't hand in known bad data and expect Volatility to magically "autocorrect" mistakes. That isn't how forensics works.

Second clarification: This is NOT just an issue of people converting vol2 profiles, it is also a problem WE created by publishing a dwarf2json for a very long time that produced broken type information. We should NOT put it back on users who made 100s or 1000s of ISFs already to then have to re-generate them when we can simply fix 99% of the problem with a proper object_from_symbol update.

WE will not compound past errors by implementing a bad fix. This is not a blame game and having a WE versus THEM attitude is adversarial and not beneficial. Otherwise I might have to point out that OUR tool is working fine and THEIR dwarf2json tool is what went wrong. It's easy to clump things together to support your argument, but it only leads to division.

Make the process of conversion as straight forward as possible, the warning on detecting a bad version is fine, I'm even willing to consider running the conversion process as part of volatility as long as it is a) very clearly signposted that it's altering the ISF and b) is an entirely separate module (ideally with conversions encoded in data files so it can be updated more easily in the future).

The solution I suggested, which you haven't responded to, would keep all the guess work separate, and resolve the issue in essentially the same way as you're suggesting, but without polluting the main codebase which as you've already noted is not at fault and functions correctly. The warning on failure when using an ISF from a known bad generator is a good idea and definitely one we should implement. It's almost as if we predicted some ISF files may be badly generated, and included version numbers about the generator in them for just such a situation.

@atcuno
Copy link
Contributor Author

atcuno commented Oct 5, 2024

The "WE" here was us/volatility as a whole, not individual people. The users of Volatility and dwarf2json see us as one organization since we are. I wasn't trying to be hostile about that, I just 100% do not believe in forcing our users to do a bunch of work - work that they might not even able to do if they deleted the vmlinux files since generation, believing they already made vol3 compatible symbol tables.

I also 100% am against changing the ISF files on the fly or in some outside tools, as if/when 6 months from now we discover some other breaking issue in dwarf2json... Are we going to ask users to re-generate them also like they had to for many years with module.c updates for every system they had a profile for? And then every other time there is another break they generate them again? And keep 100s GBs of vmlinux files around forever? It isn't feasible for users to do or reasonable for us to ask.

Putting the fix in the framework is the only sane solution, and it would only be for symbols that have not changed types since being introduced into the Linux kernel. This isn't patching guesswork, this is us as developers 100% knowing that, for example, init_task is a task struct, so we put that into the framework - there is no guessing here, this is 100% certainty. And this does match 100% how forensics works - if you know the data type of the data you are about to parse, then you treat that data as the that type.

Also, our plugins like pslist operate knowing that init_task is a task_struct, otherwise the plugin wouldn't be able to know the member names to access and the data structures of those members to parse. Saying some outside tool has to fix a symbol table file to call it a task_struct, when the plugin is going to operate as if its a task_struct no matter what anyway is a very arbitrary line to draw and Gus and I cannot fathom where this restriction is even coming from. If object_from_symbol already took the expected type as a parameter and then verified it, I would understand your point, but as it is now it just seems like some very random and out of place restriction you are trying to put on us that a plugin cannot specify what type a symbol is - even though once it gets the symbol object constructed (init_task->task_struct), the plugin is going to act assuming its a certain type anyway. So why is there a restriction on specifying the type if the plugin is going to act as if its that type anyway?

Gus and I literally cannot understand this point, and letting plugins do this in 100% type certainty cases introduce 0 risk or errors into the framework. Again - the plugin will parse the data as if it is that type anyway.

If not changing object_from_symbol is somehow logical and Gus and I cannot see how, then we will just change every affected call in Linux to do this like this pattern:

ns_addr = vmlinux.get_symbol("init_pid_ns").address

ns_addr = vmlinux.get_symbol("init_pid_ns").address
ns = vmlinux.object("pid_namespace", offset=ns_addr)

Where we call get_symbol().address and then create an object() at that address... but this seems really ugly when object_from_symbol could just support the optional parameter when we already know the type ahead of time, and warn about bad ISFs.

Either way, just let us know if object_from_symbol can change or if we need to change all the affected instances to the way with get_symbol followed by object().

@ikelos
Copy link
Member

ikelos commented Oct 5, 2024

Ok,

I've spent hours crafting this reply, and it was way too long and deals a lot with hostility and aggressiveness that I don't think belongs in such discussions, so here's the short version:

  • Similarities between our solutions
  • User runs volatility, sees the outcome they wanted/expected

Differences between our solutions:

  • Initial proposal (change a core function to accept fallback)
  • Code to determine types (requires changing and/or more code if there's updates)
  • Changes to every plugin (each plugin would require fixing if an issue were found)
  • Applies to all input tables including ones that may not have been caused by the issue trying to be solved)
  • Changes require additional conditional statements in all affected plugins
  • Could be ready in two weeks with a lot of work and haste (actually a minus, because likely to contain errors)
  • New proposal (apply fallback type to broken ISF files only)
  • Data to determine what changes should be made (updatable without changing code)
  • Centralized changes (one change works in all instances)
  • Applies only to specific ISF files known to have been produced badly
  • Updates can be applied by changes to data files without any code alterations
  • Creates template for fixing issues in generation scripts in the future
  • Difficult to get ready in two weeks
Initial proposal pipeline

Existing ISF -> volatility3 loader -> plugin -> plugin decides to override types based on... something -> possibly correct output, but difficult to tell

New proposal pipeline

Existing ISF -> volatility3 loader -> Bad ISF detected and corrected by standalone verifiable module with sufficient warning -> all existing plugins run as normal, no guesswork required -> presumed correct output

Some arguments that didn't belong in the discussion:

  • "I also 100% am against changing the ISF files on the fly" - That's kind of what you're suggesting, just later in the pipeline once the ISF has already been interpreted?
  • "And keep 100s GBs of vmlinux files around forever?" - This ignored/misunderstood the proposal and suggested an extreme hyperbole.
  • "As if/when 6 months from now we discover some other breaking issue in dwarf2json..." - pointing out a flaw in both proposals as a flaw against one isn't really a strong argument against.
  • "It isn't feasible for users to do or reasonable for us to ask." - Moral high-ground, with the implicit assertion that this is fixable without cost. We can't keep everyone 100% happy, and your solution just puts the burden on us to keep the code up to date and functional for the rest of time. My proposed solution does put the burden on us, but makes it more managable by keeping the changes as data rather than as code.
  • "this is us as developers 100% knowing that, for example, init_task is a task struct" - you've worked in parsing arbitrary data for probably longer than I have "100% knowing" basically assumes no errors ever occur and that data acquisition is always 100% accurate and actionable. It also papers over the diagnostic costs of determining that an upstream change is now represented as our tool returning bogus results, rather than as a highly visible error. Emphasizing 100% actually undermined this argument for me rather than reinforced it.
  • "it just seems like some very random and out of place restriction you are trying to put on us that a plugin cannot specify what type a symbol is" - Not understanding my point is a reason to request clarification and try to identify the issue. Extending this perspective we could code everything with fallbacks and then optimize the ISF files not to contain types, just relative offsets. I hope you'd agree that's a poor solution. If it's poor for the entire framework, why would we implement it everywhere for broken files?
  • "and letting plugins do this in 100% type certainty cases introduce 0 risk or errors into the framework" - again 100% and 0 risk are assertions of certainty and relatively brittle to unknown situations. Volatility 3 is a forensics tool (at least partially) and designed around defensive coding not around best guesses.
  • "If not changing object_from_symbol is somehow logical and Gus and I cannot see how, then we will just change every affected call in Linux to do this like this pattern..." - This isn't productive, it's essentially imploring me to change my view or telling me you'll ignore it (or find a creative way to circumvent it). You asked for my view, I'm sorry it wasn't what you wanted or you don't understand my perspective. I am open to suggestions, I've proposed one I think should satisfy you, but in your first paragraph you said you were 100% against changing the ISF on the fly, which is rather a restrictive criterion and may make it difficult for us to establish a solution that keeps us both happy.
  • "Either way, just let us know if object_from_symbol can change or if we need to change all the affected instances to the way with get_symbol followed by object()." - This is phrased as an ultimatum but basically asserts you're going to do whatever you want. It's also a false dichotomy since I feel there's a solution that isn't either of those two proposals. It's not clear why you bothered spelling that out, other than to have one last chance at forcing me to change my mind to avoid conflict? I do not approve of the solution you proposed and I'd urge you to reread the solution I suggested above. I am a volunteer and one means of resolution could be that you fork the project, or that I just stop working on it, but I'm not sure either of those are in the interests of the project? I'll leave it up to you if you'd like to de-escalate your threats or not...

@atcuno
Copy link
Contributor Author

atcuno commented Oct 5, 2024

Ok so for this:

Existing ISF -> volatility3 loader -> Bad ISF detected and corrected by standalone verifiable module with sufficient warning -> all existing plugins run as normal, no guesswork required -> presumed correct output

By standalone module, I am understanding that as a standalone Python module (file with a class) that the ISF loader in Volatility 3 will call out to at load time. Can you point to where in the current code you want the ISF loader to call into this stand alone module?

Also are you envisioning a dictionary like symbol -> type, so an example:

{
    "init_task" : "task_struct",
    "module_kset" : "kset",
    ...
 }

Then, in whatever place you point us in the code, the standalone module would be called with the current ISF and the standalone module would check each of the types in the dictionary, and, if they are void, then put the correct type in?

I tried to match my list above to these:

Data to determine what changes should be made (updatable without changing code)
Centralized changes (one change works in all instances)
Applies only to specific ISF files known to have been produced badly
Updates can be applied by changes to data files without any code alterations

Gus and I have time allocated over the next ~2 weeks to basically be full time on vol3 and can get it done, but show us where the transform should happen so we understand and verify my understanding above so we don't go in circles.

@ikelos
Copy link
Member

ikelos commented Oct 5, 2024

Yep, much like the PDB convertor is a standalone tool that we've integrated into the workflow of volatility (and it's written as a module with very basic extensions to make it standalone), I was envisaging something similar here.

Essentially it would live in intermed.py, probably around here so that it applies to all versions.

The format of the data would need a little thinking about. It should probably have a section saying if there's a particular set of metadata file it should be applied to (ie, only particular old versions of dwarf2json, etc). We then might want to make it versioned for future extensibility and we might want to figure out the kinds of changes we want to apply (it should probably be limited to type changes, rather than offset changes, since that would easily open it up to abuse and people would want "old value + x" type capabilities, but for instance, changing the subtype of an array, or the target of a pointer to pointer to something?). So it'll need a little thought, but we should be able to mock up a format that's flexible enough for the future. Identifiers would probably be "object type", "member name", possibly "previous/original type" (so we don't overwrite stuff we didn't intend to) and then "type to change to" (supporting standard definition from our existing schema). Might be a little tricky with schema changes, so we'd need to consider that too, but as I say, if we can mock up a working one then we can tweak it to make sure it's suitable before primetime...

I'm still not sure if it'll be ready in two weeks (especially since I'm away for a week of it), but we can see where we can get to. I'm not too concerned if it's a week or two late. Much better to do something right than rush it and then mess things up more that will require even longer to fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants