Skip to content

v15.4: latest TensorRT library

Compare
Choose a tag to compare
@github-actions github-actions released this 07 Sep 01:05
· 30 commits to master since this release

TRT

  • Upgraded to TensorRT 10.4.0.

General

  • Upgraded to CUDA 12.6.0.

vsmlrt.py

  • Added support for Ani4K-v2 model by @srk24 in #105
  • Added support for RIFE v4.23 and v4.24 models.
  • Add max_tactics option to the TRT backend, which can reduce engine build time by limiting the number of tactics to time.
    • By default, TensorRT will determine the number of tactics based on its own heuristic.

Batch Inference (Preview)

The latest vsmlrt.py (not in v15.4) provides experimental support for batch inference via batch_size option in inference() and flexible_inference(), which may improve device utilization for inference on small inputs using some small models.

This feature requires flexible output support starting with vs-mlrt v15 and is inspired by styler00dollar/VSGAN-tensorrt-docker@ac47012.

Note that not all onnx models are supported.

Preliminary benchmark:

  • NVIDIA GeForce RTX 4090
  • driver 560.94
  • Windows Server 2019
  • python 3.12.6, vapoursynth-classic R57.A10
  • input: 720x480 RGBS
  • backend: TRT(fp16=True, use_cuda_graph=True)

Measurements: FPS / Device Memory (MB)

model batch 1 batch 2
realesrgan compact (stream 1) 73.01 / 708 138.68 / 950
realesrgan compact (streams 2) 107.81 / 914 263.87 / 1347
realesrgan compact (streams 3) 108.30 / 1128 348.23 / 1738
realesrgan ultracompact (stream 1) 99.43 / 702 165.52 / 950
realesrgan ultracompact (streams 2) 184.48 / 908 302.56 / 1344
realesrgan ultracompact (streams 3) 184.69 / 1114 458.18 / 1738

Full Changelog: v15.3...v15.4