You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Metal compiler and DXC are based on clang and inherit the "Infinite loop without side-effects is UB" from C++. SPIR-V also requires shader invocations to terminate.
The fact that "all loops must terminate" is a requirement of downstream shader compilers, but they might not at runtime gets us into trouble. They are allowed to make the assumption that loops do terminate which has far-reaching consequences. See comments in #6528 for the whole background.
WebGPU requires loops to terminate gpuweb/gpuweb#3126 or the user agent might lose the device. The issue is that it's statically unprovable that a loop terminates (in all cases) so this can't be a check we do. We must emit loops that might not terminate but if we do, we trigger UB in downstream shader compilers.
To avoid triggering UB in downstream shader compilers we must prove to them that loops terminate or that they have side-effects.
The only way to we've found to avoid the UB via side-effects is to loop based on a volatile bool (originally implemented in tint). Open question: Are there other ways we could artificially introduce side-effects that prevent the UB?
This was done for Metal in #6545 but it prevents other meaningful optimizations like inlining. A previous iteration of this where the check was only happening before the loop was found to be very slow #6518 (comment), the new check is probably going to be extremely slow since it's happening on every loop iteration.
I'm proposing that we inject a counter that puts an upper bound on the number of loop iterations so that downstream shader compilers will see that the loop does terminate (even if it will take a really long time). We can start with an upper bound of u64::MAX (using 2 u32s as outlined in #6528 (comment)) and see if we can get away with a single u32 later. We can have this limit even if it's not part of the WGSL spec since drivers will end up terminating the invocations and lose the device after a certain amount of time has passed; which will certainly happen before we loop u64::MAX times.
Doing it this way should be much faster than reading a volatile every loop iteration and still allows other optimizations to see the loop might terminate a lot earlier so that it can even be inlined; see #6528 (comment).
Checklist
MSL
HLSL
SPIR-V
GLSL?
The text was updated successfully, but these errors were encountered:
Interestingly SPIR-V has a max iteration hint for loops that can only be at most u32::MAX:
MaxIterations
Unchecked assertion that the loop executes at most a given number of iterations. The iteration count is specified in a subsequent unsigned 32-bit integer literal operand.
Some very rough math:
u32::MAX is 4294967295, let's assume a timeout of 2s (Windows's default) and take an NVIDIA RTX 4090 with a base clock of 2235 MHz:
At a super conservative instruction per cycle of 1 that's 1 instruction per loop iteration. Most likely no instruction will take 1 cycle but those were conservative numbers, I heard linux has higher timeouts and even on windows they seem to be configurable.
So it does seem to me that it is feasible to hit a bound of u32::MAX. In practice real-time applications will/should never hit a bound of u32::MAX but we can't count on this; it's unfortunate though that we need 2 u32 counters even for those apps.
The Metal compiler and DXC are based on clang and inherit the "Infinite loop without side-effects is UB" from C++. SPIR-V also requires shader invocations to terminate.
The fact that "all loops must terminate" is a requirement of downstream shader compilers, but they might not at runtime gets us into trouble. They are allowed to make the assumption that loops do terminate which has far-reaching consequences. See comments in #6528 for the whole background.
WebGPU requires loops to terminate gpuweb/gpuweb#3126 or the user agent might lose the device. The issue is that it's statically unprovable that a loop terminates (in all cases) so this can't be a check we do. We must emit loops that might not terminate but if we do, we trigger UB in downstream shader compilers.
To avoid triggering UB in downstream shader compilers we must prove to them that loops terminate or that they have side-effects.
The only way to we've found to avoid the UB via side-effects is to loop based on a volatile bool (originally implemented in tint). Open question: Are there other ways we could artificially introduce side-effects that prevent the UB?
This was done for Metal in #6545 but it prevents other meaningful optimizations like inlining. A previous iteration of this where the check was only happening before the loop was found to be very slow #6518 (comment), the new check is probably going to be extremely slow since it's happening on every loop iteration.
I'm proposing that we inject a counter that puts an upper bound on the number of loop iterations so that downstream shader compilers will see that the loop does terminate (even if it will take a really long time). We can start with an upper bound of
u64::MAX
(using 2u32
s as outlined in #6528 (comment)) and see if we can get away with a singleu32
later. We can have this limit even if it's not part of the WGSL spec since drivers will end up terminating the invocations and lose the device after a certain amount of time has passed; which will certainly happen before we loopu64::MAX
times.Doing it this way should be much faster than reading a volatile every loop iteration and still allows other optimizations to see the loop might terminate a lot earlier so that it can even be inlined; see #6528 (comment).
Checklist
The text was updated successfully, but these errors were encountered: