-
Notifications
You must be signed in to change notification settings - Fork 455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Supporting Linux symbol tables converted from Vol2 or made for kernels without full vmlinux available #1291
Comments
This fundamentally breaks assumptions about how volatility 3 was designed, and leads us away from being a forensic tool towards being a "best guess" tool. Trying to find a way to do this in a short time-frame is infeasible and applies an unnecessary pressure, given it's been a well known design decision for 10+ years of volatility 3. Memory forensics is a difficult task, and there is no obligation on the project to support every situation. It's the same reason we don't support process dumps or partial memory dumps. I'm happy to consider ideas, but for most of them I'd want a way to very clearly point out that an incomplete symbol table was provided and therefore the results are a) liable to fail for certain plugins (not necessarily core plugin, but ones anyone's written) and b) were generated by hand and may be wildly inaccurate. A lot of our plugins use knowledge of other structures to determine what version they are and therefore how to read information. That goes away if we're guessing at structures. If we can identify known types with "what we assume they should be" subtypes, I'd far sooner have a separate tool run over the symbol table and fix it up, than littering volatility code with "best guess" structures. That ensures that the data-in (symbol table and memory dump) are still very clearly external to the forensics that the tool does. If someone wants a "what if" tool where they can dynamically decide what structures a piece of data might contain, I have no problem with them building one using volatility as a library, but I strongly disagree that it should be built into volatility by default. I'm still not entirely happy with even that idea, it would require a clear way of differentiating between custom generated ISF and those we produce/were produced from complete information. Since we're an open source we don't really have a way of signing them with a secret or performing any kind of validity on the symbol tables people have produced or that we consume. At the moment it's held together by a somewhat complicated means of identifying the tables which requires actually understanding what's going on to produce working tables. At least we can say "if the input's bad then the output might be bad", if we start building guesses or assumptions into the tool then it becomes difficult to tell whether it was bad data fed in, or we made a poor assumption. |
A couple things:
Looking at Gus' PR here: https://github.com/volatilityfoundation/volatility3/pull/1283/files#diff-e5a8a9af6057f1e6e2579f04f7ec5567f6fc9f7e32f74175332b280237964336R73 We also know the type of what those will be, but again the symbol table without the symbol types makes it where he has to put in an ugly hack. The same flow of his code (check symbol table for type, fall back to plugin specified one) needs to be generalized for plugins to access. So we just need to design a change so the a fallback type can be specified by plugins in the call.
|
Hi, would it be possible to acquire those "custom" made symbols tables, to investigate if a solution to directly patch the ISF before feeding it to volatility3 can be possible ? Also, were they made based on a Thanks edit : As I suppose that the "broken" ISF still contains the Someone only possessing a Vol2 profile as a last resort (kernel source is gone), will necessarily need to convert it to Vol3 with some custom dwarf2json anyway, as the Linux banner will need to be manually specified too ? |
Yes, the symbol tables have the definition of the type, but the symbol's type is missing - see If that is truly uncomfortable for people, we can vollog.warning (always forced to output) when a type is being fell back to, but in reality it is not something that is going to cause problems. We also cannot force this on the producer of the ISF as then we are recreating the problems we had with module.c where people had to rebuild profiles (now symbol tables) often as the types we need change. Instead, Volatility - which knows exactly the type it needs and the types name - should do it right in the call to We also have ISFs produced from older versions of dwarf2json that we published as ready to use, and that people built many symbol tables from, that do not produce ISFs with the correct types and we cannot expect or force users to migrate their entire data set each time we update versions or fix a dwarf2json bug. We also cannot expect them to keep the debug kernels around forever -- and again, all this is fixed by putting in the fallback types in the Volatility plugins that use the symbols. When it comes time for cutover and we are pushing for our user base to switch, we will get one chance with users who rely on Volatility for critical real-life investigations. If they run Vol3 and samples aren't supported that Vol2 supports, then they will not come back to Vol3 anytime soon, if ever. |
To make this more tangible, Gus is going to make a PR against this ticket that shows |
From my point of view, I think this can fit for core kernel structures, which haven't and won't likely change in many years. However, I totally understand ikelos concerns, that a lot of people might come and raise issues, because Vol3 will be able to run pslist with their "Vol2"/"old dwarf2json" ISFs but not the latest brand new plugins, as it will for example miss the 0.7.0 dwarf2json fix for anonymous structs. I guess you should test a "vol2 ISF" with the experimental Of course, this is just my opinion as a contributor 👍 |
Just to clarify: the data structure definitions are in the ISF. What is missing is the type information for the global symbols, such as this from the ticket header:
So when pslist does For symbols whose type changes over time, we can keep |
Second clarification: This is NOT just an issue of people converting vol2 profiles, it is also a problem WE created by publishing a dwarf2json for a very long time that produced broken type information. We should NOT put it back on users who made 100s or 1000s of ISFs already to then have to re-generate them when we can simply fix 99% of the problem with a proper object_from_symbol update. |
Yes I completely understand the issue with the void definition, it's just to ensure that all plugins (not only pslist) will still work with incomplete ISFs and this additional feature (if it gets integrated), to ensure smooth transition for new users 👍. |
Yes, once we have this issue merged into develop + the few remaining plugins for parity merged, we will be running every plugin for vol3 against 200-300 Linux samples to discover and verify any issues or bugs we missed so far. This will definitely uncover any problems like you said. Gus and I will be leading this effort. The infrastructure is already in place, we are just waiting for vol3 to be ready. |
Tangibility is not the issue. I understand the problem you're trying to "solve" and I've given my reasons against it and explained an alternative. Spending time trying to convince me is basically ignoring my reply. Changing our tool to support a different broken tool, would be like attempting to add support for lime's
If we obviously know, then write a tool that changes the ISF it hands in. Don't hand in known bad data and expect Volatility to magically "autocorrect" mistakes. That isn't how forensics works.
WE will not compound past errors by implementing a bad fix. This is not a blame game and having a WE versus THEM attitude is adversarial and not beneficial. Otherwise I might have to point out that OUR tool is working fine and THEIR dwarf2json tool is what went wrong. It's easy to clump things together to support your argument, but it only leads to division. Make the process of conversion as straight forward as possible, the warning on detecting a bad version is fine, I'm even willing to consider running the conversion process as part of volatility as long as it is a) very clearly signposted that it's altering the ISF and b) is an entirely separate module (ideally with conversions encoded in data files so it can be updated more easily in the future). The solution I suggested, which you haven't responded to, would keep all the guess work separate, and resolve the issue in essentially the same way as you're suggesting, but without polluting the main codebase which as you've already noted is not at fault and functions correctly. The warning on failure when using an ISF from a known bad generator is a good idea and definitely one we should implement. It's almost as if we predicted some ISF files may be badly generated, and included version numbers about the generator in them for just such a situation. |
The "WE" here was us/volatility as a whole, not individual people. The users of Volatility and dwarf2json see us as one organization since we are. I wasn't trying to be hostile about that, I just 100% do not believe in forcing our users to do a bunch of work - work that they might not even able to do if they deleted the vmlinux files since generation, believing they already made vol3 compatible symbol tables. I also 100% am against changing the ISF files on the fly or in some outside tools, as if/when 6 months from now we discover some other breaking issue in dwarf2json... Are we going to ask users to re-generate them also like they had to for many years with module.c updates for every system they had a profile for? And then every other time there is another break they generate them again? And keep 100s GBs of vmlinux files around forever? It isn't feasible for users to do or reasonable for us to ask. Putting the fix in the framework is the only sane solution, and it would only be for symbols that have not changed types since being introduced into the Linux kernel. This isn't patching guesswork, this is us as developers 100% knowing that, for example, init_task is a task struct, so we put that into the framework - there is no guessing here, this is 100% certainty. And this does match 100% how forensics works - if you know the data type of the data you are about to parse, then you treat that data as the that type. Also, our plugins like pslist operate knowing that init_task is a task_struct, otherwise the plugin wouldn't be able to know the member names to access and the data structures of those members to parse. Saying some outside tool has to fix a symbol table file to call it a task_struct, when the plugin is going to operate as if its a task_struct no matter what anyway is a very arbitrary line to draw and Gus and I cannot fathom where this restriction is even coming from. If Gus and I literally cannot understand this point, and letting plugins do this in 100% type certainty cases introduce 0 risk or errors into the framework. Again - the plugin will parse the data as if it is that type anyway. If not changing
Where we call Either way, just let us know if object_from_symbol can change or if we need to change all the affected instances to the way with get_symbol followed by object(). |
Ok, I've spent hours crafting this reply, and it was way too long and deals a lot with hostility and aggressiveness that I don't think belongs in such discussions, so here's the short version:
Differences between our solutions:
Some arguments that didn't belong in the discussion:
|
Ok so for this:
By standalone module, I am understanding that as a standalone Python module (file with a class) that the ISF loader in Volatility 3 will call out to at load time. Can you point to where in the current code you want the ISF loader to call into this stand alone module? Also are you envisioning a dictionary like
Then, in whatever place you point us in the code, the standalone module would be called with the current ISF and the standalone module would check each of the types in the dictionary, and, if they are void, then put the correct type in? I tried to match my list above to these:
Gus and I have time allocated over the next ~2 weeks to basically be full time on vol3 and can get it done, but show us where the transform should happen so we understand and verify my understanding above so we don't go in circles. |
Yep, much like the PDB convertor is a standalone tool that we've integrated into the workflow of volatility (and it's written as a module with very basic extensions to make it standalone), I was envisaging something similar here. Essentially it would live in The format of the data would need a little thinking about. It should probably have a section saying if there's a particular set of metadata file it should be applied to (ie, only particular old versions of dwarf2json, etc). We then might want to make it versioned for future extensibility and we might want to figure out the kinds of changes we want to apply (it should probably be limited to type changes, rather than offset changes, since that would easily open it up to abuse and people would want "old value + x" type capabilities, but for instance, changing the subtype of an array, or the target of a pointer to pointer to something?). So it'll need a little thought, but we should be able to mock up a format that's flexible enough for the future. Identifiers would probably be "object type", "member name", possibly "previous/original type" (so we don't overwrite stuff we didn't intend to) and then "type to change to" (supporting standard definition from our existing schema). Might be a little tricky with schema changes, so we'd need to consider that too, but as I say, if we can mock up a working one then we can tweak it to make sure it's suitable before primetime... I'm still not sure if it'll be ready in two weeks (especially since I'm away for a week of it), but we can see where we can get to. I'm not too concerned if it's a week or two late. Much better to do something right than rush it and then mess things up more that will require even longer to fix it. |
@ikelos we need to make a decision and quickly/soon implement how we are going to support symbol tables created from converted vol2 profiles and from kernels where full debug vmlinux files are not available (both of these trigger the same issue). This will be a major blocker for our push to get people off Volatility 2 otherwise as not every kernel has a debug vmlinux public and people have 10+ years of vol2 profiles made for in-house kernels where they won't have kept the vmlinux.
This issue manifests in calls to
object_from_symbol
and users generally notice it right away as this call to instantiate init_task in linux.pslist leads to a backtrace when the list is attempted to be walked:https://github.com/volatilityfoundation/volatility3/blob/develop/volatility3/framework/plugins/linux/pslist.py#L211
In these symbol tables without full debug kernels,
object_from_symbol
cannot create the object as the symbol table looks like this:And since the type is void, obviously the backend doesn't know its real type. For symbol tables made without the debug info, dwarf2json can't magically figure out the type.
To fix this, the cleanest way I can envision is to have an optional parameter to
object_from_symbol
where the caller can set a backup type in cases of these symbol tables. The backend API can then check the symbol table (ISF) for the type, and if a symbol's type is void, then it can look at the optional parameter for the real type. In this pslist case, sending in "task_struct" through this flow would get the object instantiated as the same type as when a debug kernel is available to build the ISF.This will also let the fix be input at initial development type of future plugins.
Does this way work for you? Do you want to take a shot at the first draft of implementing the updated
object_from_symbol
API? If not this way, then please suggest another that is feasible in a quick timeframe as this is preventing us from performing the mass testing we need to do to get vol3 into cut over state from vol2.The text was updated successfully, but these errors were encountered: