01 Nov 10:57

github-actions

b96fd10

v12: latest CUDA libraries

Compared to v11, this release updated CUDA dependencies to CUDA 11.8.0, cuDNN 8.6.0 and TensorRT 8.5.1:

Added support for the NVIDIA 40 series GPUs.
Added support for RIFE on the trt backend.

Known issue

Performance of the OV_CPU or ORT_CUDA(fp16=True) backends for RIFE is lower than expected, which is under investigation. Please consider ORT_CPU or ORT_CUDA(fp16=False) for now.
The NCNN_VK backend does not support RIFE.

Installation Notes

For some advanced features, vsmlrt.py requires numpy and onnx packages to be available. You might need to run pip install onnx numpy.

Benchmark

previous benchmark

Configuration: NVIDIA RTX 3090, driver 526.47, windows server 2019, vs r60, python 3.11.0, 1080p fp16

Backends: ort-cuda, trt from vs-mlrt v12.

For the trt backend, the engine is created without CUDA_MODULE_LOADING=LAZY environment variable and with it during benchmarking to reduce device memory consumption.

Data format: fps / GPU memory usage (MB)

rife(model=44, 1920x1088)

backend	1 stream	2 streams
ort-cuda	53.62/1771	83.34/2748
trt	71.30/ 626	107.3/ 962

dpir color

backend	1 stream	2 streams
ort-cuda	4.64/3230
trt	10.32/1992	11.61/3475

waifu2x upconv_7

backend	1 stream	2 streams
ort-cuda	11.07/5916	15.04/10899
trt	18.38/2092	31.64/ 3848

waifu2x cunet

backend	1 stream	2 streams
ort-cuda	4.63/8541	5.32/16148
trt	11.44/4771	15.59/ 8972

realesrgan v2/v3

backend	1 stream	2 streams
ort-cuda	8.84/2283	11.10/4202
trt	14.59/1324	21.37/2174

Assets 12

26 Oct 00:37

github-actions

v11

afa5399

v11 RIFE support

Added support for the RIFE video frame interpolation algorithm.

There are two APIs for RIFE:

vsmlrt.RIFE is a high-level API for interpolating a clip. set the multi argument to specify the fps factor. Just remember to perform scene detection on the input clip.
vsmlrt.RIFEMerge is a novel temporal std.MaskedMerge-like interface for RIFE. Use it if you want to precisely control the frames and/or time point for the interpolation.

Known issues

vstrt doesn't support RIFE for the moment¹. The next release of TensorRT should include RIFE support and we will release v12 when that happens.
vstrt backend also doesn't yet support latest RTX 4000 series GPUs. This will be fixed after upgrading to the upcoming TensorRT 8.5 release. RTX 4000 series GPU owners please use other the other CUDA backends.
Users of the OV_GPU backend may experience errors like Exceeded max size of memory object allocation: Requested 11456040960 bytes but max alloc size is 4294959104 bytes. Please consider tiling for now.

The reason is that the openvino library follows the opencl standard on memory object allocation restriction (CL_DEVICE_MAX_MEM_ALLOC_SIZE). For most existing intel gpus (gen9 and later), the driver imposes a maximum allocation size of ~4GiB².

It's missing grid_sample operator support, see https://github.com/onnx/onnx-tensorrt/blob/main/docs/operators.md. ↩
this value is derived from here, which states that device not supporting sharedSystemMemCapabilities has a maximum allowed allocation size of 4294959104 bytes ↩

Assets 12

23 Sep 07:08

github-actions

v11.test

fc22c89

v11.test Pre-release

Pre-release

internal testing only.

Added support for the RIFE video frame interpolation algorithm. Some features are still being implemented. The Python RIFE model wrapper interface is still subject to change.

Known issue

Users of the OV_GPU backend may experience errors like Exceeded max size of memory object allocation: Requested 11456040960 bytes but max alloc size is 4294959104 bytes. Please consider tiling for now.

The reason is that the openvino library follows the opencl standard on memory object allocation restriction (CL_DEVICE_MAX_MEM_ALLOC_SIZE). For most existing intel gpus (gen9 and later), the driver imposes a maximum allocation size of ~4GiB¹.

this value is derived from here, which states that device not supporting sharedSystemMemCapabilities has a maximum allowed allocation size of 4294959104 bytes ↩

Assets 12

23 Sep 07:22

WolframRhodium

model-20220923

bf84bcb

Model Release 20220923, RIFE model Pre-release

Pre-release

New modules (compared to previous model release):

RIFE v4.0 from vs-rife v2.0.0. rife/rife_v4.0.onnx, config: fastmode=True, ensemble=False
RIFE v4.2, v4.3, v4.4, v4.5, v4.6, v4.7, v4.8, v4.9, v4.10 from Practical-RIFE. rife/rife_{v4.2,v4.3,v4.4,v4.5,v4.6,v4.7,v4.8,v4.9,v4.10}.onnx, config: fastmode=True, ensemble=False
Other provided RIFE models can be found here, including v2 representation of RIFE v4.7-v4.10 models. Sorry for the inconvenience.

Notes:

For RIFE on ort-gpu, vs-mlrt v11 or later is suggested for best performance. And (as of v11), only ov-cpu, ort-cpu, ort-cuda, trt (pending new TensorRT release) support RIFE. Specifically, ncnn-vk do not support RIFE due to missing gridsample op.

Assets 3

15 Sep 11:02

github-actions

v10

babf997

v10: new vulkan based vsncnn (AMD GPU supported)

Release Highlight

Vulkan based AMD GPU support added with the new vsncnn-vk backend.

Major features

Introduced ncnn-based vsncnn plugin that supports any GPU with Vulkan support (NVidia, AMD, Intel integrated & discrete).
- Good news for AMD GPU users! vs-mlrt has finally achieved full platform coverage: from x86 CPU to GPU of all three major vendors.
- Please refer to the benchmark below for performance details. Tl;dr it's comparable to vsort-cuda on most networks (except waifu2x-cunet), but (significantly) slower than vstrt. Owing to its C++ implementation, it's generally faster than Python based ncnn implementations.
- Hint: If your GPU has enough memory, please consider setting num_streams>1 to extract more performance.
- Even though it's possible to use software based Vulkan implementations (as we did in the GHA tests), if you want to do CPU-only inference, it's much better to use vsov-cpu (or vsort-cpu).
Introduced a new smaller Vulkan-based GPU binary package (vsmlrt-windows-x64-vk.v10.7z) that only includes vsov-{cpu,gpu}, vsort-cpu and vsncnn-vk. Use this if you only use Intel/AMD GPU or don't want to download 1GB data in exchange for a backend that is merely 2~8x faster. Now there shouldn't be any reasons not to use vs-mlrt.

Benchmark

Configuration: NVIDIA RTX 3090, driver 516.94, windows server 2019, vs r60, python 3.10.7, 1080p fp16

Backends: ncnn-vk, ort-cuda, trt from vs-mlrt v10, dpir-ncnn v2.0.0, w2xncnnvk r2

Data format: fps / GPU memory usage (MB)

dpir color

backend	1 stream	2 streams
ncnn-vk	4.33/3347	4.72/6119
ort-cuda	4.56/3595
trt	10.64/2595	11.10/4593
dpir-ncnn	3.68/3326

waifu2x upconv_7

backend	1 stream	2 streams
ncnn-vk	9.46/6820	14.71/13468
ort-cuda	12.10/6411	13.98/11273
trt	21.32/3317	29.10/ 5053
w2xncnnvk	6.68/6931	12.70/13626

waifu2x cunet

backend	1 stream	2 streams
ncnn-vk	1.46/11908	1.53/23574
ort-cuda	4.85/ 8793	5.18/16231
trt	11.60/ 4960	15.60/ 9057
w2xncnnvk	1.38/11966	1.58/23687

realesrgan v2/v3

backend	1 stream	2 streams
ncnn-vk	7.23/2781	8.35/5330
ort-cuda	9.05/2669	10.18/4539
trt	15.93/1667	19.58/2543

Assets 12

14 Sep 10:20

github-actions

v10.pre

babf997

v10.pre Pre-release

Pre-release

This is a pre-release for testing & benchmarking purposes only.
For production use, please use the official v10 release.

Release Highlight

Vulkan based AMD GPU support added with the new vsncnn-vk backend.

Major features

Introduced ncnn-based vsncnn plugin that supports any GPU with Vulkan support (NVidia, AMD, Intel integrated & discrete). Good news for AMD GPU users! vs-mlrt has finally achieved full platform coverage: from x86 CPU to GPU of all three major vendors.
Introduced a new smaller Vulkan-based GPU binary package (vsmlrt-windows-x64-vk.v10.pre.7z) that only includes vsov-{cpu,gpu}, vsort-cpu and vsncnn-vk. Use this if you only use Intel/AMD GPU or don't want to download 1GB data in exchange for a backend that is merely 3x faster. Now there shouldn't be any reasons not to use vs-mlrt.

Assets 12

07 Aug 07:48

github-actions

v9.2

afcf5e6

v9.2

Fixed issues

In vs-mlrt v9 and v9.1 on windows, the ORT_CUDA backend may fails for out of memory when processing a noninitial frame. This has been fixed and the performance should be improved.
Parameter use_cuda_graph of the ORT_CUDA backend now works properly on windows. It is however not recommended to use currently.

Full Changelog: v9.1...v9.2

Assets 10

28 Jul 07:25

github-actions

v9.1

99ebde4

v9.1

Bugfix release for v9. Recommended update for v9 users.
Please see release notes for v9 to see all the major new features.

Fix ort_cuda fp16 inference for CUGAN(version=2) model.

A new parameter fp16_blacklist_ops is introduced in ort and ov backends for other issues possibly related to reduced precision.

Please still carefully review the output of fp16 accelerated CUGAN(version=2).
Conform with CUGAN(version=2)'s dynamic range compression. This feature is enabled by setting conformance=True (which is the default) in the CUGAN wrapper in vsmlrt.py, and it's implemented as:
```
clip = clip.std.Expr("x 0.7 * 0.15 +")
clip = CUGAN(clip, version=2)
clip = clip.std.Expr("x 0.15 - 0.7 /")
```

Known issues

These two issues are fixed in the v9.2 release.
- The ORT_CUDA backend allocates memory during inference. This degrades performance and may results in out of memory error.
- Parameter use_cuda_graph of the ORT_CUDA backend is broken on Windows.

Full Changelog: v9...v9.1

Assets 10

25 Mar 10:46

github-actions

b7cd813

v9 Major release: Intel GPU support & much more

This is a major release.

Added support for Intel GPUs (both discrete [Xe Arc series] and integrated [Gen 8+ on Broadwell+])
- In vsmlrt.py, this corresponds to the OV_GPU backend.
- The openvino library is now dynamically linked because of the integration of oneDNN for GPU.
Added support for RealESRGANv3 and cugan-pro models.
Upgraded CUDA toolkit to 11.7.0, TensorRT to 8.4.1 and cuDNN to 8.4.1. It is now possible to build TRT engines for CUGAN, waifu2x cunet and upresnet10 models on RTX 2000 and RTX 3000 series GPUs.
The trt backend in vsmlrt.py wrapper now creates a log file for trtexec output in the TEMP directory (this only works if using the bundled trtexec.exe.) The log file will only be retained if trtexec fails (and the vsmlrt exception message will include the full path of the log file.) If you want the log to go to a specific file, set environment variable TRTEXEC_LOG_FILE to the absolute path of the log file. If you don't want this behavior, set log=False when creating the backend (e.g.vsmlrt.Backend.TRT(log=False))
The cuda bundles now include VC runtime DLLs as well, so trtexec.exe should run even on systems without proper VC runtime redistributable packages installed (e.g. freshly installed Windows).
The ov backend can now configure model compilation via config. Available configurations can be found here.
- Example:
```
core.ov.Model(..., config = lambda: dict(CPU_THROUGHPUT_STREAMS=core.num_threads, CPU_BIND_THREAD="NO"))
```
  This configuration may be useful in improving processor utilization at the expense of significantly increased memory consumption (only try this if you have a huge number of cores underutilized by the default settings.)
  
  The equivalent form for the python wrapper is
```
backend = vsmlrt.Backend.OV_CPU(num_streams=core.num_threads, bind_thread=False)
```
When using the vsmlrt.py wrapper, it will no longer create temporary onnx files (e.g. when using non-default alpha CUGAN parameters). Instead, the modified ONNX network will be passed directly into the various ML runtime filters. Those filters now supports (network_path=b'raw onnx protobuf serialization', path_is_serialization=True) for this. This feature also opens the door for generating ONNX on the fly (e.g. ever dreamed of GPU accelerated 2d-convolution or std.Expr?)

Update Instructions

Delete the previous vsmlrt-cuda, vsov, vsort and vstrt directories and vsov.dll, vsort.dll and vstrt.dll from your VS plugins directory and then extract the newly released files (specifically, do not leave files from previous version and just overwrite with the new release as the new release might have removed some files in those four directories.)
Replace vsmlrt.py in your Python package directory.
Updated models directories by overwriting with the new release. (Models are generally append only. We will make special notices and bump the model release tag if we change any of the previously released models.)

Compatibility Notes

vsmrt.py in this release is not compatible with binaries in previous releases, only script level compatibility is maintained. Generally, please make sure to upgrade the filters and vsmlrt.py as a whole.

We strive to maintain script source level compatibility as much as possible (i.e. there won't be a great api4 breakage), and it means script writing for v7 (for example) will continue to function for the foreseeable future. Minor issues (like the non-monotonic denoise setting of cugan) will be documented instead of fixed with a breaking change.

Known issue

CUGAN(version=2) (a.k.a. cugan-pro) may produces blank clip when using the ORT_CUDA(fp16) backend. This is fixed in the v10 release.

Full Changelog: v8...v9

Assets 10

12 Mar 06:43

github-actions

c415cdd

v8: latest CUDA libraries and ~10% faster

This release upgrades the cuda libraries to their latest version. Models are observed to be accelerated by ~1.1x.
vsmlrt.CUGAN() now accepts a new parameter alpha, which controls the strength of filtering. Setting alpha to non-default values requires the Python onnx package (but this might change in the future.)
Added tf32 parameter to the trt backend in vsmlrt.py. TF32 acceleration is enabled by default on the Ampere GPUs, mostly for fp32 inference, and it has no effect on other architectures.

Assets 10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Known issue

Installation Notes

Benchmark

rife(model=44, 1920x1088)

dpir color

waifu2x upconv_7

waifu2x cunet

realesrgan v2/v3

Known issues

Known issue

Release Highlight

Major features

Benchmark

dpir color

waifu2x upconv_7

waifu2x cunet

realesrgan v2/v3

Release Highlight

Major features

Fixed issues

Known issues

Update Instructions

Compatibility Notes

Known issue

Releases: AmusementClub/vs-mlrt

v12: latest CUDA libraries

Known issue

Installation Notes

Benchmark

rife(model=44, 1920x1088)

dpir color

waifu2x upconv_7

waifu2x cunet

realesrgan v2/v3

v11 RIFE support

Known issues

v11.test

Known issue

Model Release 20220923, RIFE model

v10: new vulkan based vsncnn (AMD GPU supported)

Release Highlight

Major features

Benchmark

dpir color

waifu2x upconv_7

waifu2x cunet

realesrgan v2/v3

v10.pre

Release Highlight

Major features

v9.2

Fixed issues

v9.1

Known issues

v9 Major release: Intel GPU support & much more

Update Instructions

Compatibility Notes

Known issue

v8: latest CUDA libraries and ~10% faster