v15.4: latest TensorRT library
TRT
- Upgraded to TensorRT 10.4.0.
General
- Upgraded to CUDA 12.6.0.
vsmlrt.py
- Added support for Ani4K-v2 model by @srk24 in #105
- Added support for RIFE v4.23 and v4.24 models.
- Add
max_tactics
option to theTRT
backend, which can reduce engine build time by limiting the number of tactics to time.- By default, TensorRT will determine the number of tactics based on its own heuristic.
Batch Inference (Preview)
The latest vsmlrt.py (not in v15.4) provides experimental support for batch inference via batch_size
option in inference()
and flexible_inference()
, which may improve device utilization for inference on small inputs using some small models.
This feature requires flexible output support starting with vs-mlrt v15 and is inspired by styler00dollar/VSGAN-tensorrt-docker@ac47012.
Note that not all onnx models are supported.
Preliminary benchmark:
- NVIDIA GeForce RTX 4090
- driver 560.94
- Windows Server 2019
- python 3.12.6, vapoursynth-classic R57.A10
- input: 720x480 RGBS
- backend:
TRT(fp16=True, use_cuda_graph=True)
Measurements: FPS / Device Memory (MB)
model | batch 1 | batch 2 |
---|---|---|
realesrgan compact (stream 1) | 73.01 / 708 | 138.68 / 950 |
realesrgan compact (streams 2) | 107.81 / 914 | 263.87 / 1347 |
realesrgan compact (streams 3) | 108.30 / 1128 | 348.23 / 1738 |
realesrgan ultracompact (stream 1) | 99.43 / 702 | 165.52 / 950 |
realesrgan ultracompact (streams 2) | 184.48 / 908 | 302.56 / 1344 |
realesrgan ultracompact (streams 3) | 184.69 / 1114 | 458.18 / 1738 |
Full Changelog: v15.3...v15.4