-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking issue for asm
(inline assembly)
#29722
Comments
Will there be any difficulties with ensuring the backward-compatibility of inline assembly in stable code? |
@main-- has a great comment at rust-lang/rfcs#1471 (comment) that I'm reproducing here for posterity:
|
I personally think it would be better to do what Microsoft did in MSVC x64: define a (nearly-)comprehensive set of intrinsic functions, for each asm instruction, and do "inline asm" exclusively through those intrinsics. Otherwise, it's very difficult to optimize the code surrounding inline asm, which is ironic since many uses of inline asm are intended to be performance optimizations. One advantage of the instrinsic-based approach is that it doesn't need to be an all-or-nothing thing. You can define the most needed intrinsics first, and build the set out incrementally. For example, for crypto, having Further, the intrinsics would be a good idea to add even if it were ultimately decided to support inline asm, as they are much more convenient to use (based on my experience using them in C and C++), so starting with the intrinsics and seeing how far we get seems like a zero-risk-of-being-wrong thing. |
Intrinsics are good, but I expect that kind of hackery will be rare, but I think it's still a useful thing to support. |
Inline asm is also useful for code that wants to do its own register/stack allocation (e.g. naked functions). |
@briansmith yeah those are some excellent reasons to use intrinsics where possible. But it's nice to have inline assembly as the ultimate excape hatch. |
@briansmith Note that On the other hand, intrinsics critically depend on a "sufficiently smart compiler" to achieve at least the performance one would get with a hand-rolled asm implementation. My knowledge on this is outdated but unless there has been significant progress, intrinsics-based implementations are still measurably inferior in many - if not most - cases. Of course they're much more convenient to use but I'd say that programmers really don't care much about that when they're willing to descend into the world of specific CPU instructions. Now another interesting consideration is that intrinsics could be coupled with fallback code on architectures where they're not supported. This gives you the best of both worlds: Your code is still portable - it can just employ some hardware accelerated operations where the hardware supports them. Of course this only really pays off for either very common instructions or if the application has one obvious target architecture. Now the reason why I'm mentioning this is that while one could argue that this may potentially even be undesirable with compiler-provided intrinsics (as you'd probably care about whether you actually get the accelerated versions plus compiler complexity is never good) I'd say that it's a different story if the intrinsics are provided by a library (and only implemented using inline asm). In fact, this is the big picture I'd prefer even though I can see myself using intrinsics more than inline asm. (I consider the intrinsics from RFC #1199 somewhat orthogonal to this discussion as they exist mostly to make SIMD work.) |
I'm not sure what you mean here. It's true that the compiler can't break down the asm into its individual operations to do strength reduction or peephole optimizations on it. But in the GCC model, at least, the compiler can allocate the registers it uses, copy it when it replicates code paths, delete it if it's never used, and so on. If the asm isn't volatile, GCC has enough information to treat it like any other opaque operation like, say, But I haven't used it a whole lot, especially not recently. And I have no experience with LLVM's rendition of the feature. So I'm wondering what's changed, or what I've misunderstood all this time. |
We discussed this issue at the recent work week as @japaric's survey of the
Despite the issues listed above we wanted to be sure to at least come away with some ability to move this issue forward! To that end we brainstormed a few strategies of how we can nudge inline assembly towards stabilization. The primary way forward would be to investigate what clang does. Presumably clang and C have effectively stable inline assembly syntax and it may be likely that we can just mirror whatever clang does (especially wrt LLVM). It would be great to understand in greater depth how clang implements inline assembly. Does clang have its own translation layer? Does it validate any input parameters? (etc) Another possibility for moving forward is to see if there's an assembler we can just take off the shelf from elsewhere that's already stable. Some ideas here were nasm or the plan9 assembler. Using LLVM's assembler has the same problems about stability guarantees as the inline assembly instruction in the IR. (it's a possibility, but we need a stability guarantee before using it) |
I would like to point out that LLVM's inline asm syntax is different from the one used by clang/gcc. Differences include:
Clang will convert inline asm from the gcc format into the LLVM format before passing it on to LLVM. It also performs some validation of the constraints: for example it ensures that In light of this I think that we should implement the same translation and validation that clang does and support proper gcc inline asm syntax instead of the weird LLVM one. |
There's an excellent video about summaries with D, MSVC, gcc, LLVM, and Rust with slides online |
As someone who'd love to be able to use inline ASM in stable Rust, and with more experience than I want trying to access some of the LLVM MC APIs from Rust, some thoughts:
|
I've been having a bit of play to see what can be done with procedural macros. I've written one that converts GCC style inline assembly to rust style https://github.com/parched/gcc-asm-rs. I've also started working on one that uses a DSL where the user doesn't have to understand the constraints and they're all handled automatically. So I've come to the conclusion that I think rust should just stabilise the bare building blocks, then the community can iterate out of tree with macros to come up with best solutions. Basically, just stabilise the llvm style we have now with only "r" and "i" and maybe "m" constraints, and no clobbers. Other constraints and clobbers can be stabilised later with their own mini rfc type things. |
Personally I'm starting to feel as though stabilizing this feature is the sort of massive task that will never get done unless somehow someone hires a full-time expert contractor to push on this for a whole year. I want to believe that @parched's suggestion of stabilizing |
As a data point, I happen to be working on a crate right now that depends on While it certainly has its advantages, I'm a bit wary of the "stabilize building blocks and leave the rest to proc-macros"-approach. It essentially outsources the design, RFC and implementation process to whoever wants to do the job, potentially no one. Of course having weaker stability/quality guarantees is the entire point (the tradeoff is that having something imperfect is already much better than having nothing at all), I understand that. At least the building blocks should be well-designed - and in my opinion, |
One idea, … Today, there is already a project, named dynasm, which can help you generate assembly code with a plugin used to pre-process the assembly with one flavor of x64 code. This project does not answer the problem of inline assembly, but it can certainly help, if rustc were to provide a way to map variables to registers, and accept to insert set of bytes in the code, such project could also be used to fill-up these set of bytes. This way, the only standardization part needed from rustc point of view, is the ability to inject any byte sequence in the generated code, and to enforce specific register allocations. This removes all the choice for specific languages flavors. Even without dynasm, this can also be used as a way to make macros for the cpuid / rtdsc instructions, which would just be translated into the raw sequence of bytes. I guess the next question might be if we want to add additional properties/constraints to the byte-sequences. |
[EDIT: I don't think anything I said in this comment is correct.] If we want to continue to use LLVM's integrated assembler (I assume this is faster than spawning an external assembler), then stabilization means stabilizing on exactly what LLVM's inline assembly expressions and integrated assembler support—and compensating for changes to those, should any occur. If we're willing to spawn an external assembler, then we can use any syntax we want, but we're then foregoing the advantages of the integrated assembler, and exposed to changes in whatever external assembler we're calling. |
I think it would be strange to stabilize on LLVM's format when even Clang doesn't do that. Presumably it does use LLVM's support internally, but it presents an interface more like GCC. |
I'm 100% fine with saying "Rust supports exactly what Clang supports" and calling it a day, especially since AFAIK Clang's stance is "Clang supports exactly what GCC supports". If we ever have a real Rust spec, we can soften the language to "inline assembly is implementation-defined". Precedence and de-facto standardization are powerful tools. If we can repurpose Clang's own code for translating GCC syntax to LLVM, all the better. The alternative backend concerns don't go away, but theoretically a Rust frontend to GCC wouldn't be much vexed. Less for us to design, less for us to endlessly bikeshed, less for us to teach, less for us to maintain. |
If we stabilize something defined in terms of what clang supports, then we should call it There are a few things I'd like to see in Rust inline assembly:
Norman Ramsey and Mary Fernández wrote some papers about the New Jersey Machine Code Toolkit way back when that have excellent ideas for describing assembly/machine language pairs in a compact way. They tackle (Pentium Pro-era) iA-32 instruction encodings; it is not at all limited to neat RISC ISAs. |
I'd like to reiterate again the conclusions from the most recent work week:
To me this is the definition of "if we stabilize this now we will guarantee to regret it in the future", and not only "regret it" but seems very likely for "causes serious problems to implement any new system". At the absolute bare minimum I'd firmly believe that bullet (2) cannot be compromised on (aka the definition of stable in "stable channel"). The other bullets would be quite sad into forgo as it erodes the expected quality of the Rust compiler which is currently quite high. |
@jcranmer wrote:
I would think that, in practice, it would be quite difficult to infer clobber lists. Just because a machine-language fragment uses a register doesn't mean it clobbers it; perhaps it saves it and restores it. Conservative approaches could discourage the code generator from using registers that would be fine to use. |
Does anyone know what the most recent proposal/current status is? Since the theme of the year is "maturity and finishing what we started", it seems like a great opportunity to finally finish up |
Vague plans for an new (to be stabilized) syntax were discussed last February: https://paper.dropbox.com/doc/FFI-5NmXV30TGiSsr9dIxpqpq According to those notes @joshtriplett and @Amanieu signed up to write an RFC. |
Inline ASM isn't stable, see https://doc.rust-lang.org/unstable-book/language-features/asm.html and rust-lang/rust#29722. We use asm in `src/kernel/execve/loader`, which needs direct access to registers and can't be done in pure rust. We also use the `syscall` crate in several places, which also uses inline asm.
What is the status of the new syntax? |
It needs to be RFC'ed and implemented on nightly |
ping @joshtriplett @Amanieu Let me know if I can help move things along here! I'll be in touch shortly. |
@cramertj AFAICT anybody can move this forward, this is unblocked and waiting on somebody to step in and put in the work. There is a pre-RFC sketching the overall design, and the next steps could be to implement that and see if it actually works, either as a proc macro, in a fork, or as a different unstable feature. One could probably try to just turn that pre-RFC into a proper RFC and submit it, but I doubt that without an implementation such an RFC can be convincing. EDIT: to be clear, by convincing I specifically mean parts of the pre-RFC like this one:
where there are dozens of arch-specific register classes in the lang-ref. An RFC cannot just wave all of these out, and making sure that they all work like they are supposed to, or are meaningful, or are "stable" enough in LLVM to be exposed from here, etc. would benefit from an implementation one can just try these in. |
Is RISC-V inline assembly supported here with |
To the best of my knowledge, all assembly on supported platforms is supported; it's pretty much raw access to the llvm compiler's asm support. |
Yes, RISC-V is supported. Architecture-specific input/output/clobber constraint classes are documented in the LLVM langref. There is a caveat, however - if you need to constrain to individual registers in input/output/clobber constraints, you must use the architectural register names (x0-x31, f0-f31), not the ABI names. In the Assembly fragment itself, you can use either kind of register name. |
As someone new to these concepts can I just say... this whole discussion seems silly. How is it that a language (assembly) which is supposed to be a 1 to 1 mapping with it's machine code causes this much headache? I'm pretty confused:
I get that backwards compatibility is an issue, but with the huge number of bugs and the fact that this was never stabilized maybe it would be better to just pass it along to the backend. Rust shouldn't be in the business of trying to fix LLVM's or gcc's or anyone else's |
The reason there is no progress here is that nobody is investing time in fixing this issue. That's not a good reason for stabilizing a feature. |
While reading through this thread, I had an idea and had to post it. Sorry if I'm answering to an old post, but I thought it was worth it: @main-- said:
Maybe instead of inline asm, what we really need here are function attributes for LLVM that told the optimizer: "optimize this for throughput", "optimize this for latency", "optimize this for binary size". I know this solution is upstream, but it would not only solve your particular problem automatically (by providing the lower-latency but otherwise isomorphic implementation of the algorithm), it would also allow Rust programmers to have more fine-grained control over the performance characteristics that matter to them. |
@felix91gr That doesn't solve usecases that require emitting an exact sequence of instructions, eg interrupt handlers. |
@mark-i-m of course not. That's why I put a literal quote! 🙂 My point was that even though you might solve the "compiler optimizes in a way opposite of what I need" (which is classic in their case: latency vs throughput) by using inline asm features, maybe (and imo definitely) that use case would be served better by more fine-grained control of optimizations :) |
In light of the upcoming changes to inline assembly, most of the discussion in this issue is no longer relevant. As such, I'm going to close this issue in favor of two separate tracking issue for each flavor of inline assembly we have:
|
This issue tracks stabilization of inline assembly. The current feature has not gone through the RFC process, and will probably need to do so prior to stabilization.
The text was updated successfully, but these errors were encountered: