Releases: AmusementClub/VapourSynth-FFT3DFilter
R2.AC3-AVX: AVX only R2.AC3 release
R2.AC3 re-release that only requires AVX (with performance penalties.)
The other R2.AC3 release requires AVX+FMA+BMI1 (i.e. Haswell or better) and provides the best performance.
Please note that in general AmusementClub filters' minimum cpu requirement is AVX2 (unless otherwise noted,)
as we don't even have pre-AVX2 cpus to test the filters (this release is tested using Intel SDE.)
R2.AC3 Even faster....
In addition to changes to the previous R2.AC2 release, this release switches to use clang to build the release. Because of clang's advanced vectorizer, this release is 25% faster compared to R2.AC2 release 1, and the performance gap between this release and neo-FFT3D on QTGMC(Preset='Very Slow')
is negligible, without using SIMD intrinsics.
Special Notes: this release requires CPU to support at least AVX+FMA+BMI1 extensions (i.e. Haswell or better).
-
Measured on VS R57, Skylake Xeon, Windows Server 2019. (
bt=3
) ↩
R2.AC2 ~2x faster QTGMC
Compared to upstream R2 release:
- Dual api3/api4 support (without any code modification or duplication; just add one generic C++ source
api3.cc
file) thanks to vs-api3; This release requires vs-api3 v2.1 or above. - Significant additional speedup: thanks to @WolframRhodium's optimizations,
havsfunc.QTGMC(Preset="Very Slow")
is now 1.5~2x faster and can properly use more CPUs. (You don't need api4 VS to take advantage of this speedup. Use vs-c api3 to reduce memory usage.) - Additionally, this release statically links fftw3f (with multithreading and avx2 support) to simplify deployment.
Special Notes: this release requires CPU to support at least AVX+FMA+BMI1 extensions (i.e. Haswell or better).
R2.AC dual api3/api4 support
Compared to upstream R2 release:
- Dual api3/api4 support (without any code modification or duplication; just add one generic C++ source
api3.cc
file) thanks to vs-api3. - Additionally, this release statically links fftw3f (with multithreading and avx2 support) to simplify deployment.
PS: it turns out that the real reason havsfunc.QTGMC
runs much faster with api4 VS than with api3 VS is due to optimizations in this filter. Now api3 VS users can enjoy the same speedups as well. (In fact, VS R54 is even slightly faster than R57 when using this release despite the inherent api3-to-api4 bridge overhead [do you understand why we insist on using api3 now?])