Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[naga] Prove to downstream shader compilers that loops terminate #6572

Open
4 tasks
teoxoy opened this issue Nov 20, 2024 · 1 comment
Open
4 tasks

[naga] Prove to downstream shader compilers that loops terminate #6572

teoxoy opened this issue Nov 20, 2024 · 1 comment
Labels
area: naga back-end Outputs of naga shader conversion naga Shader Translator

Comments

@teoxoy
Copy link
Member

teoxoy commented Nov 20, 2024

The Metal compiler and DXC are based on clang and inherit the "Infinite loop without side-effects is UB" from C++. SPIR-V also requires shader invocations to terminate.

The fact that "all loops must terminate" is a requirement of downstream shader compilers, but they might not at runtime gets us into trouble. They are allowed to make the assumption that loops do terminate which has far-reaching consequences. See comments in #6528 for the whole background.

WebGPU requires loops to terminate gpuweb/gpuweb#3126 or the user agent might lose the device. The issue is that it's statically unprovable that a loop terminates (in all cases) so this can't be a check we do. We must emit loops that might not terminate but if we do, we trigger UB in downstream shader compilers.

To avoid triggering UB in downstream shader compilers we must prove to them that loops terminate or that they have side-effects.

The only way to we've found to avoid the UB via side-effects is to loop based on a volatile bool (originally implemented in tint). Open question: Are there other ways we could artificially introduce side-effects that prevent the UB?

This was done for Metal in #6545 but it prevents other meaningful optimizations like inlining. A previous iteration of this where the check was only happening before the loop was found to be very slow #6518 (comment), the new check is probably going to be extremely slow since it's happening on every loop iteration.

I'm proposing that we inject a counter that puts an upper bound on the number of loop iterations so that downstream shader compilers will see that the loop does terminate (even if it will take a really long time). We can start with an upper bound of u64::MAX (using 2 u32s as outlined in #6528 (comment)) and see if we can get away with a single u32 later. We can have this limit even if it's not part of the WGSL spec since drivers will end up terminating the invocations and lose the device after a certain amount of time has passed; which will certainly happen before we loop u64::MAX times.

Doing it this way should be much faster than reading a volatile every loop iteration and still allows other optimizations to see the loop might terminate a lot earlier so that it can even be inlined; see #6528 (comment).


Checklist

  • MSL
  • HLSL
  • SPIR-V
  • GLSL?
@teoxoy teoxoy added area: naga back-end Outputs of naga shader conversion naga Shader Translator labels Nov 20, 2024
@teoxoy
Copy link
Member Author

teoxoy commented Nov 20, 2024

Interestingly SPIR-V has a max iteration hint for loops that can only be at most u32::MAX:

MaxIterations
Unchecked assertion that the loop executes at most a given number of iterations. The iteration count is specified in a subsequent unsigned 32-bit integer literal operand.

Some very rough math:

u32::MAX is 4294967295, let's assume a timeout of 2s (Windows's default) and take an NVIDIA RTX 4090 with a base clock of 2235 MHz:

loop iterations/s = 4294967295 / 2 = 2147483647.5
cycles/loop iteration = 2235000000 / 2147483647.5 = 1.04

At a super conservative instruction per cycle of 1 that's 1 instruction per loop iteration. Most likely no instruction will take 1 cycle but those were conservative numbers, I heard linux has higher timeouts and even on windows they seem to be configurable.

So it does seem to me that it is feasible to hit a bound of u32::MAX. In practice real-time applications will/should never hit a bound of u32::MAX but we can't count on this; it's unfortunate though that we need 2 u32 counters even for those apps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: naga back-end Outputs of naga shader conversion naga Shader Translator
Projects
Status: Todo
Development

No branches or pull requests

1 participant