v14.test3: latest TensorRT, MIGraphX backend
Pre-releaseThis is a preview release for TensorRT 9.2.0, following the v14.test
and v14.test2
releases.
-
Same as those releases, it requires Pascal GPUs or later (10 series+) and driver version >= 525. Support for Kepler 2.0 and Maxwell GPUs is dropped.
-
TensorRT 9.2.0 is officially documented as
for Large Language Models (LLMs) on NVIDIA A100, A10G, L4, L40, L40S, H100 GPUs, and NVIDIA GH200 Grace Hopper™ Superchip only
. The Windows build is downloaded from here, and can be used on other GPU models. -
Users should use the same version of TensorRT as provided (9.2.0) because runtime version checking is disabled in this release.
-
Added support for AnimeJaNai V3 models, contributed by contributed by @hooke007 in #82.
-
Added support for RIFE v4.13 ~ v4.16 (lite, ensemble) models, which are also available for previous vs-mlrt releases (simply download the new model file here and update
vsmlrt.py
).- The v4.13 ~ v4.15 models should have the same execution speed as the v4.10 - v4.12 models.
- The v4.13 lite model, the v4.15 lite model and the v4.16 lite model should all have the same execution speed as the v4.12 lite model, while the v4.14 lite model may run slower.
-
Added support for fractional video frame interpolation in RIFE.
- Playback in video players should also set
video_player=True
(#59 (comment)). This change is experimental.
- Playback in video players should also set
-
Fixed an issue that causes the
TRT
backend crashes during script realoading. (#65). It is also fixed in the latest iteration of thev14.test2
release. -
RIFE v4.7+ models with v2 representation are not working with dynamic shapes (#72). This has been reported to TensorRT developers.
-
Initial MIGraphX support (experimental) for AMD GPUs.
- fp16 I/O contributed by @abihf in #86.
- Multi-stream execution, device selection, hip graphs and dynamic shapes are not explicitly supported for now.
preliminary benchmark on Radeon RX 7900 XTX 1:
- resolution: 1920x1080
- measurements: fps / device memory (MB)
model fp32 fp16 dpir gray 2.33 / 2829 7.29 / 1702 dpir color 2.27 / 2861 7.03 / 1734 waifu2x upconv7 6.31 / 4540 12.90 / 2503 waifu2x upresnet10 6.65 / 3077 13.63 / 1775 waifu2x cunet / cugan 3.69 / 6711 8.36 / 3591 waifu2x swin_unet 2 2.19 / 9791 4.53 / 5236 realesrgan 3 5.75 / 1961 11.57 / 959 rife 4 N/A N/A -
Also check the release notes of the
v14.test
andv14.test2
releases.
benchmark
- RTX 4090
- processor clock @ 2520 MHz
- Intel Icelake server @ 2100 MHz
- Driver 551.86
- Windows 10 21H2 (19044.1415)
- TensorRT 9.2.0
- VapourSynth-Classic R57.A8, vapoursynth-plugin v0.96g3
1920x1080 rgbs, CUDA graphs enabled, fp16
Measurements: FPS / Device Memory (MB)
general
model | 1 stream | 2 streams | 3 streams |
---|---|---|---|
dpir gray | 21.93/1757.352 | 25.48/3049.696 | 25.31/4342.044 |
dpir color | 18.24/1790.184 | 25.11/3115.360 | 25.22/4440.540 |
waifu2x upconv_7_{anime_style_art_rgb, photo} | 19.58/2148.716 | 39.87/3867.240 | 59.94/5585.768 |
waifu2x upresnet10 | 17.40/1655.144 | 34.22/2880.096 | 42.78/4105.048 |
waifu2x cunet / cugan | 13.64/4391.292 | 25.09/8346.248 | 25.19/12301.208 |
waifu2x swin_unet | 4.62/14989.772 | OOM | OOM |
real-esrgan (v2/v3, xsx2) | 16.77/1136.996 | 33.99/1876.568 | 41.44/2616.140 |
rife
v2, fp16 i/o
version | 1 stream | 2 streams | 3 streams | 4 streams | 5 streams |
---|---|---|---|---|---|
v4.4-v4.5 | 150.20/622.784 | 301.05/835.860 | 448.90/1053.024 | 615.84/1268.152 | 787.57/1481.224 |
v4.6 | 147.63/624.832 | 294.53/837.904 | 452.26/1055.072 | 603.63/1270.200 | 764.31/1485.320 |
v4.7-v4.9 | 132.06/747.712 | 268.63/1075.476 | 403.54/1405.284 | 494.98/1737.152 | 496.41/2064.908 |
v4.10-v4.15 | 119.09/862.400 | 238.68/1304.852 | 346.98/1749.352 | 349.48/2195.904 | 349.80/2638.356 |
{v4.12, v4.13, v4.15, v4.16}_lite | 123.72/782.528 | 250.81/1151.252 | 377.27/1522.020 | 403.14/1894.844 | 403.79/2263.568 |
v4.14 lite | 117.97/839.872 | 234.67/1265.940 | 320.23/1696.104 | 321.88/2124.224 | 321.18/2552.340 |
- This pre-release uses trt 9.2.0 + cuda 12.3.1 + cudnn 8.9.6, which requires a minimum driver version of 525 and is compatible with 10 series and newer GPUs, with no significant performance improvement measured.
vsmlrt.py
in all branches can be used interchangeably.
-
RDNA3, Navi 31, 12288 shaders, processor clock @ 2399 MHz, memory clock @ 1249 MHz, driver 6.0.32831, PCIe 4.0 x16, MIGraphX 2.8.0, ROCm 6.0.2, Linux 6.7.0-060700-generic, VapourSynth-Classic R57.A8 ↩
-
tested on MIGraphX 2.10.0 ROCm/AMDMIGraphX@ecd5adc, requiers MIGraphX 2.9.0 ROCm/AMDMIGraphX@2d4a6507c3ad41f9d7ea36de1d7fb257cc788585s and replacing edge padding by reflection padding ↩
-
tested on MIGraphX 2.10.0 ROCm/AMDMIGraphX@ecd5adc, requiers MIGraphX 2.9.0 ROCm/AMDMIGraphX@2d4a6507c3ad41f9d7ea36de1d7fb257cc788585s ↩
-
missing support for
GridSample
operation ↩