Release v15.5: latest TensorRT library, CoreML backend · AmusementClub/vs-mlrt

TRT

Upgraded to TensorRT 10.5.0.
Volta GPUs (TITAN V, V100) are no longer supported.

ORT

Fix MacOS CoreML support for vsort by @yuygfgg in #106.

This pull request also added theORT_COREML backend to vsmlrt.py.

General

Upgraded to CUDA 12.6.1.

vsmlrt.py

Added support for RIFE v4.25 and v4.26 models.

Added automatic batch inference support via batch_size option in inference() and flexible_inference(), which may improve device utilization for inference on small inputs using some small models.

On the one hand, batching improves utilization by creating more work for each kernel invocation and reducing quantization inefficiency of kernel tiles in bulk parallelism. It also reduces average kernel launch and synchronization overhead per work.
On the other hand, however, batching causes cache misses and inserts bubbles in the pipeline that may degrade performance.

This feature requires flexible output support starting with vs-mlrt v15 and is inspired by styler00dollar/VSGAN-tensorrt-docker@ac47012.

Note that not all onnx models are supported.

Future RIFE v2 models will be fixed to support batch inference.

benchmark:

NVIDIA GeForce RTX 4090
driver 560.94
Windows Server 2019
python 3.12.6, vapoursynth-classic R57.A10, vs-mlrt v15.4
input: 720x480 RGBS
backend: TRT(fp16=True, use_cuda_graph=True)

Measurements: FPS / Device Memory (MB)

model	batch 1	batch 2
realesrgan compact (stream 1)	73.01 / 708	138.68 / 950
realesrgan compact (streams 2)	107.81 / 914	263.87 / 1347
realesrgan compact (streams 3)	108.30 / 1128	348.23 / 1738
realesrgan ultracompact (stream 1)	99.43 / 702	165.52 / 950
realesrgan ultracompact (streams 2)	184.48 / 908	302.56 / 1344
realesrgan ultracompact (streams 3)	184.69 / 1114	458.18 / 1738

Full Changelog: v15.4...v15.5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v15.5: latest TensorRT library, CoreML backend

TRT

ORT

General

vsmlrt.py

Contributors