Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x64: use ByteSink more liberally in the rex module #9505

Merged
merged 2 commits into from
Oct 23, 2024

Conversation

abrown
Copy link
Contributor

@abrown abrown commented Oct 23, 2024

In looking at adding auto-generated assembler code to Cranelift, I've noticed that we pass MachBuffer down when it is not always necessary. The ByteSink trait (which MachBuffer implements) is a simplified view to put* bytes into the buffer and it is a bit simpler to use when testing since it is also implemented by Vec<u8>. This change propagates the trait to the rex module.

In looking at adding auto-generated assembler code to Cranelift, I've
noticed that we pass `MachBuffer` down when it is not always necessary.
The `ByteSink` trait (which `MachBuffer` implements) is a simplified
view to `put*` bytes into the buffer and it is a bit simpler to use when
testing since it is also implemented by `Vec<u8>`. This change
propagates the trait to the `rex` module.
@abrown abrown requested a review from a team as a code owner October 23, 2024 22:32
@abrown abrown requested review from cfallin and removed request for a team October 23, 2024 22:32
@abrown
Copy link
Contributor Author

abrown commented Oct 23, 2024

Similar to #9504, this is just an attempt at trying to simplify what is needed to actually assemble an instruction. We still need MachBuffer for instructions that touch memory; thinking about it, it may just be unavoidable.

self.emit_two_op(sink, enc_g, 0);
}
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, actually let me remove this: this is for the new assembler but isn't used yet.

pub fn must_always_emit(&self) -> bool {
(self.0 & 2) != 0
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...and a little bit of code movement.

Copy link
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine, thanks!

Re: handling Amodes and the helper that briefly appeared here: I think we'll need MachBuffer unavoidably once we have amodes that can refer to labels; that's the semantic thing that MachBuffer gives us above an arbitrary byte sequence. But totally fine to trim the close coupling at lower levels.

I'd love to talk more about the assembler design w.r.t. dependencies -- intuitively I would have thought that MachBuffer would be an integral part of it, i.e. it's the assembler's way of providing its output (given that assemblers need label references, and to flow through other metadata like trap info and relocs from insts to the code blob), but maybe there's another use-case or need for generalization I'm missing.

@abrown
Copy link
Contributor Author

abrown commented Oct 23, 2024

intuitively I would have thought that MachBuffer would be an integral part of it

I mean, that's how it has to be today. I'm kind of wondering if it should, though: it's nice to be able to "just emit" an instruction into a byte buffer during testing and that's what I've been thinking about. The setup in emit_tests.rs just to be able to emit an instruction is a lot, e.g. If we're proptest-ing or fuzzing we also want the "emit a single instruction" path to be relatively quick.

On the other hand, sometimes it is unavoidable: I've notice that we register traps and labels using a MachBuffer in the encoding logic and that simply cannot be done with ByteSink. Maybe ByteSink can gain extra trait functions that do nothing for the Vec<u8> case or something like that...

@cfallin cfallin added this pull request to the merge queue Oct 23, 2024
@cfallin
Copy link
Member

cfallin commented Oct 23, 2024

Maybe ByteSink can gain extra trait functions that do nothing for the Vec case or something like that...

Ah! Yeah, that's an interesting idea: lift the whole interface of MachBuffer into a trait, and provide an implementation of that trait for a Vec<u8>.

There are interesting semantic implementations of this though when label references are involved: what should the assembled bytes from a mov rax, [rip+label] instruction be? Zeroes in the RIP offset (i.e. labels not resolved)? One aspect of the MachBuffer that is nice is that its types essentially force one to be aware one is referencing addresses in the buffer: you have to ask for a label to refer to it, the various amodes require a label, and if you don't bind it, (IIRC?) it should panic when finalizing.

Maybe another way of reading this friction is that we need some convenience APIs? It should be possible to write an assemble(inst: Inst) -> Vec<u8> that is something like:

fn assemble(inst: Inst) -> Vec<u8> {
  let mut buffer = MachBuffer::new();
  inst.emit(&mut buffer, ...);
  buffer.finalize(&VCodeConstants::default(), &ControlPlane::default())
    .apply_base_srcloc(SourceLoc::new(0))
    .data
    .into() // SmallVec into Vec
}

(whew, that is a bit of a mouthful, I see why you want it!)

Merged via the queue into bytecodealliance:main with commit 9e1be64 Oct 23, 2024
51 checks passed
@abrown abrown deleted the more-byte-sink branch October 23, 2024 23:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants