v15.5: latest TensorRT library, CoreML backend
TRT
- Upgraded to TensorRT 10.5.0.
- Volta GPUs (TITAN V, V100) are no longer supported.
ORT
-
Fix MacOS CoreML support for vsort by @yuygfgg in #106.
This pull request also added the
ORT_COREML
backend to vsmlrt.py.
General
- Upgraded to CUDA 12.6.1.
vsmlrt.py
-
Added support for RIFE v4.25 and v4.26 models.
-
Added automatic batch inference support via
batch_size
option ininference()
andflexible_inference()
, which may improve device utilization for inference on small inputs using some small models.- On the one hand, batching improves utilization by creating more work for each kernel invocation and reducing quantization inefficiency of kernel tiles in bulk parallelism. It also reduces average kernel launch and synchronization overhead per work.
- On the other hand, however, batching causes cache misses and inserts bubbles in the pipeline that may degrade performance.
This feature requires flexible output support starting with vs-mlrt v15 and is inspired by styler00dollar/VSGAN-tensorrt-docker@ac47012.
Note that not all onnx models are supported.
- Future RIFE v2 models will be fixed to support batch inference.
benchmark:
- NVIDIA GeForce RTX 4090
- driver 560.94
- Windows Server 2019
- python 3.12.6, vapoursynth-classic R57.A10, vs-mlrt v15.4
- input: 720x480 RGBS
- backend:
TRT(fp16=True, use_cuda_graph=True)
Measurements: FPS / Device Memory (MB)
model batch 1 batch 2 realesrgan compact (stream 1) 73.01 / 708 138.68 / 950 realesrgan compact (streams 2) 107.81 / 914 263.87 / 1347 realesrgan compact (streams 3) 108.30 / 1128 348.23 / 1738 realesrgan ultracompact (stream 1) 99.43 / 702 165.52 / 950 realesrgan ultracompact (streams 2) 184.48 / 908 302.56 / 1344 realesrgan ultracompact (streams 3) 184.69 / 1114 458.18 / 1738
Full Changelog: v15.4...v15.5