Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dither implementation & Banding Removal #696

Open
LordKobra opened this issue Nov 15, 2024 · 3 comments
Open

Dither implementation & Banding Removal #696

LordKobra opened this issue Nov 15, 2024 · 3 comments

Comments

@LordKobra
Copy link
Contributor

LordKobra commented Nov 15, 2024

General Issue

The game exhibits banding, most notably in the sky.

Proposed solution

Dithering applied before output conversion to 8 bit inside the tonemapping.frag shader.
Before:
Gothic2Notr exe Screenshot 2024 11 14 - 19 00 28 55
After:
Gothic2Notr exe Screenshot 2024 11 14 - 18 59 03 91

However, there's still slight banding remaining. I want to structure my thoughts into the following subcategories:

Artifact removal

The remaining banding must be present in one of the input textures. Upon inspection, I'm pretty sure it's textureD

layout(binding = 1) uniform sampler2D textureD;

I've discussed the texture format b10g11r11_ufloat with others and came to the conclusion that it does not provide a sufficient structure for high quality light rendering (especially regarding very light-sensitive shading like ray tracing). I strongly suggest to move the internal rendering to at least 16bit ufloat per channel. In addition / alternatively to that, gamma encoding the luminance values was recommended to me, so the exponential curve of the float number aligns better with the linear light values. I have no experience with gamma encoding and cannot judge if it is feasible for storage. Either way, the internal rendering needs to be adjusted and i would leave this decision to you, as you have a much better understanding of the big picture.

Dither Quality

In my own tests, a simple dither function already performed very well, but generally there's headroom when it comes to the quality.
The white noise function can be replaced with a blue noise texture or generated blue noise approximations. The performance could get slightly worse--which is still very fast--but we receive a high quality dither instead. I can also generate high-quality random numbers per channel instead, which further reduces visible noise (it's not possible to spot without zooming on my 1440p desktop atm). Give me your opinion on what your quality/performance preferences are and i can implement it. You do not have to bother yourself with the implementation.

Final Remarks

Regarding the code: I messed up my fork and you need to review #694 first, before i can publish the code to GitHub. Otherwise it will be pushed into the merge request. Until then a basic implementation has been posted to discord.

@Try
Copy link
Owner

Try commented Nov 19, 2024

Hi, @LordKobra !

Yes, you are correct about b10g11r11_ufloat precision - this is practically smallest of HDR capable formats. Motivation for this is to save memory bandwidth, by utilizing 32bpp format. In compassion switching to RGBA16F would use double amount of memory.
Maybe in 2025/2026 we can make a hardware cut, bump minimal format expectations + deprecate Vulkan1.0/DX12.0.

Current way of mitigating banding is to pre-divide lighting by exposure, effectively forcing colors into 0.0 ..1.0 range.
Back to sky rendering:

However, there's still slight banding remaining.

It's possible that skyLUT itself already has a banding. While it's 32bit float, banding is still possible due to storing luminance directly in Lux.

In my own tests, a simple dither function already performed very well

Yeah, dither probably is a better compromise for now; however not that in common.glsl there is already interleavedGradientNoise function, borrowed from here: https://blog.demofox.org/2022/01/01/interleaved-gradient-noise-a-different-kind-of-low-discrepancy-sequence/.
Maybe it's worth experimenting on applying noise to only sky sampling, or compressing skyLUT into 0..1 range.

I can also generate high-quality random numbers per channel instead, which further reduces visible noise

If it is something not based on trigonometry (saw a sin in your code), but on bit-ops it can be virtually free - as ALU is not utilized in tonemapping anyway.

@LordKobra
Copy link
Contributor Author

Hey @Try !

RGBA16F would use double amount of memory

Yes, i am aware of this restriction. But it would be interesting to see the real performance impact with this, i would certainly imagine it to be 10% or less.

Maybe in 2025/2026 we can make a hardware cut, bump minimal format expectations + deprecate Vulkan1.0/DX12.0

Can you explain to me how Vulkan restricts the usage of 16bpc? E.g. in ReShade such formats could even be used on DX9 (most likely emulated, i can ask crosire for details). In the worst case you refactor two textures into a single one and use a proxy function for sample and load operations. On the other hand 2025 is right around the corner, if you plan to deprecate old standards, such modifications might be unneccessary. (please don't feel pressured by me, i just brainstorm :D )

HDR capable formats

Another topic related to this is HDR tonemapping & output. I have a HDR monitor and if i use the Gothic DX11 renderer, i can already get HDR output, so it would be a really cool feature for OpenGothic. Should i make a new issue for that? I'd also be happy to help with that myself in the future.

It's possible that skyLUT itself already has a banding.

Good to know, i'll take a look!

interleavedGradientNoise

Oh that's great! IGN is a very fitting approximation of blue noise! Using a blue noise texture would be the optium, but IGN comes right after.

something not based on trigonometry

Ye, that was literally the most basic and low-quality noise i could get. When i talk about high quality noise, for me it usually comes down to integer hashing. I can really recommend PCG3D https://www.shadertoy.com/view/XlGcRh . I used to for sampling in ray-tracing and the noise quality was very high and i couldn't see any dominating frequencies. A cheap alternative to the sine would probably be the hash32 from here https://www.shadertoy.com/view/4djSRW

@Try
Copy link
Owner

Try commented Nov 20, 2024

Can you explain to me how Vulkan restricts the usage of 16bpc?

This is not an api restriction per say, more about "can any hardware here run it well" kind of case. For 11bit I know that NVidia/AMD/iOS/Android do have a fast-path. For 16bit, need to do more research, or wait until older hardware will fade away naturally.
And when speaking about deprecation - it's not exactly related to the formats, just Vulkan1.0 is way to 'small' on limits.

i would certainly imagine it to be 10% or less.

Depends... from possible risks can think about: reduced tile size or non-compressed texture storage.

Should i make a new issue for that? I'd also be happy to help with that myself in the future.

Yes, that would be great! Personally I do not have HDR on my windows laptop, only on mac. If you can provide windows-size, I can do Metal part for completeness.

I can really recommend PCG3D

Looks fine to me. Few additions/multiplications and bit-ops should run fast.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants