Skip to content

Releases: AmusementClub/VapourSynth-FFT3DFilter

R2.AC3-AVX: AVX only R2.AC3 release

10 Mar 10:47
ee9a30a
Compare
Choose a tag to compare
Pre-release

R2.AC3 re-release that only requires AVX (with performance penalties.)

The other R2.AC3 release requires AVX+FMA+BMI1 (i.e. Haswell or better) and provides the best performance.

Please note that in general AmusementClub filters' minimum cpu requirement is AVX2 (unless otherwise noted,)
as we don't even have pre-AVX2 cpus to test the filters (this release is tested using Intel SDE.)

R2.AC3 Even faster....

17 Jan 09:14
Compare
Choose a tag to compare

In addition to changes to the previous R2.AC2 release, this release switches to use clang to build the release. Because of clang's advanced vectorizer, this release is 25% faster compared to R2.AC2 release 1, and the performance gap between this release and neo-FFT3D on QTGMC(Preset='Very Slow') is negligible, without using SIMD intrinsics.

Special Notes: this release requires CPU to support at least AVX+FMA+BMI1 extensions (i.e. Haswell or better).

  1. Measured on VS R57, Skylake Xeon, Windows Server 2019. (bt=3)

R2.AC2 ~2x faster QTGMC

14 Jan 02:20
Compare
Choose a tag to compare

Compared to upstream R2 release:

  • Dual api3/api4 support (without any code modification or duplication; just add one generic C++ source api3.cc file) thanks to vs-api3; This release requires vs-api3 v2.1 or above.
  • Significant additional speedup: thanks to @WolframRhodium's optimizations, havsfunc.QTGMC(Preset="Very Slow") is now 1.5~2x faster and can properly use more CPUs. (You don't need api4 VS to take advantage of this speedup. Use vs-c api3 to reduce memory usage.)
  • Additionally, this release statically links fftw3f (with multithreading and avx2 support) to simplify deployment.

Special Notes: this release requires CPU to support at least AVX+FMA+BMI1 extensions (i.e. Haswell or better).

R2.AC dual api3/api4 support

13 Jan 01:20
Compare
Choose a tag to compare

Compared to upstream R2 release:

  • Dual api3/api4 support (without any code modification or duplication; just add one generic C++ source api3.cc file) thanks to vs-api3.
  • Additionally, this release statically links fftw3f (with multithreading and avx2 support) to simplify deployment.

PS: it turns out that the real reason havsfunc.QTGMC runs much faster with api4 VS than with api3 VS is due to optimizations in this filter. Now api3 VS users can enjoy the same speedups as well. (In fact, VS R54 is even slightly faster than R57 when using this release despite the inherent api3-to-api4 bridge overhead [do you understand why we insist on using api3 now?])