Feat: Make `Artifact`s support in-structure commenting #102

mahaloz · 2024-08-25T20:55:09Z

Background

In most decompilers, like IDA Pro, you can have types that have comments in them, like:

struct Elf64_Vernaux // sizeof=0x10
{                                       // XREF: LOAD:0000000000400410/r
     unsigned __int32 vna_hash;         // this is some comment on this first member
     unsigned __int16 vna_flags;
     unsigned __int16 vna_other;
     unsigned __int32 vna_name __offset(OFF64,0x400390);
     unsigned __int32 vna_next;
};

Which libbs does not currently support. An ideal solution would look like this:

my_struct = deci.structs["Elf64_Bernaux"]
print(my_struct.comments[0]) // this is some comment on this first member
print(my_struct.members[0].comment) // // this is some comment on this first member

Implementation

To support this type of commenting, we'll need to do a few things:

Update member-like Artifacts to have the comment attribute
Support setting/getting these in each decompiler (as much as possible)
Refactor Function to support comments
Remove the old comments system that simply stored all comments globally

The text was updated successfully, but these errors were encountered:

arizvisa · 2024-08-27T02:43:04Z

if you don't want to have to use the edm_t.cmt and udm_t.cmt attributes to enumerate or serialize complex field comments, you can also unpack/save them from the result of tinfo_t.serialize() ..which was the pre-8.4 method anyways ("fields" are similar).

decoding the bytes returned by tinfo_t.serialize into a list of comments is basically consuming a byte, determine whether it's an 8-bit/16-bit length, decoding said length, using the length to extract the comment, then utf-8 decoding those bytes and repeating until done.

    def decode_bytes(bytes):
        '''Decode the given `bytes` into a list containing the length and the bytes for each encoded string.'''
        ok, results, iterable = True, [], (ord for ord in bytearray(bytes))

        integer = next(iterable, None)
        length_plus_one, ok = integer or 0, False if integer is None else True
        while ok:
            one = 1 if length_plus_one < 0x7f else next(iterable, None)
            assert((one == 1) and length_plus_one > 0)
            encoded = bytearray(ord for index, ord in zip(builtins.range(length_plus_one - 1), iterable))   # using zip to clamp bytes consumed
            results.append((length_plus_one - 1, encoded)) if ok else None

            integer = next(iterable, None)
            length_plus_one, ok = integer or 0, False if integer is None else True
        return results

encoding the string passed to tinfo_t.deserialize(til, type, fields, cmts=None) requires encoding the length for each utf-8 encoded comment, and concatenating back into a stream of bytes.

apologies for the unreadability of the following.. "encode_length" is all that is relevant

    def encode_bytes(cls, strings):
        '''Encode the list of `strings` with their lengths and return them as bytes.'''
        encode_length = lambda integer: bytearray([integer + 1] if integer + 1 < 0x80 else [integer + 1, 1])
        iterable = (bytes(string) if isinstance(string, (bytes, bytearray)) else string.encode('utf-8') for string in strings)
        pairs = ((len(chunk), chunk) for chunk in iterable)
        return bytes(bytearray().join(itertools.chain(*((encode_length(length), bytearray(chunk)) for length, chunk in pairs))))

however, it's worth confirming the performance with regards to serializing/deserializing them at scale is actually relevant in binsync. minsc creates an index for all commentable "things" so that they can be tagged for searching and (mis-)used to store nearly-arbitrary data, so being able to check if a tinfo_t even has comments or distinguishing what exactly was updated (name/comment/other) in response to events (w/o having to iterate through all the fields one-by-one) made a difference.

...i'm literally praying that they don't try to retrofit repeatable/non-repeatable comments into this btw.

mahaloz added enhancement New feature or request ghidra ida binja common labels Aug 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: Make `Artifact`s support in-structure commenting #102

Feat: Make `Artifact`s support in-structure commenting #102

mahaloz commented Aug 25, 2024

arizvisa commented Aug 27, 2024 •

edited

Loading

Feat: Make Artifacts support in-structure commenting #102

Feat: Make Artifacts support in-structure commenting #102

Comments

mahaloz commented Aug 25, 2024

Background

Implementation

arizvisa commented Aug 27, 2024 • edited Loading

Feat: Make `Artifact`s support in-structure commenting #102

Feat: Make `Artifact`s support in-structure commenting #102

arizvisa commented Aug 27, 2024 •

edited

Loading