Release v14.test3: latest TensorRT, MIGraphX backend · AmusementClub/vs-mlrt

This is a preview release for TensorRT 9.2.0, following the v14.test and v14.test2 releases.

Same as those releases, it requires Pascal GPUs or later (10 series+) and driver version >= 525. Support for Kepler 2.0 and Maxwell GPUs is dropped.
TensorRT 9.2.0 is officially documented as for Large Language Models (LLMs) on NVIDIA A100, A10G, L4, L40, L40S, H100 GPUs, and NVIDIA GH200 Grace Hopper™ Superchip only. The Windows build is downloaded from here, and can be used on other GPU models.
Users should use the same version of TensorRT as provided (9.2.0) because runtime version checking is disabled in this release.
Added support for AnimeJaNai V3 models, contributed by contributed by @hooke007 in #82.
Added support for RIFE v4.13 ~ v4.16 (lite, ensemble) models, which are also available for previous vs-mlrt releases (simply download the new model file here and update vsmlrt.py).
- The v4.13 ~ v4.15 models should have the same execution speed as the v4.10 - v4.12 models.
- The v4.13 lite model, the v4.15 lite model and the v4.16 lite model should all have the same execution speed as the v4.12 lite model, while the v4.14 lite model may run slower.
Added support for fractional video frame interpolation in RIFE.
- Playback in video players should also set video_player=True (#59 (comment)). This change is experimental.
Fixed an issue that causes the TRT backend crashes during script realoading. (#65). It is also fixed in the latest iteration of the v14.test2 release.
RIFE v4.7+ models with v2 representation are not working with dynamic shapes (#72). This has been reported to TensorRT developers.

Initial MIGraphX support (experimental) for AMD GPUs.

fp16 I/O contributed by @abihf in #86.
Multi-stream execution, device selection, hip graphs and dynamic shapes are not explicitly supported for now.

preliminary benchmark on Radeon RX 7900 XTX ¹:

resolution: 1920x1080
measurements: fps / device memory (MB)

model	fp32	fp16
dpir gray	2.33 / 2829	7.29 / 1702
dpir color	2.27 / 2861	7.03 / 1734
waifu2x upconv7	6.31 / 4540	12.90 / 2503
waifu2x upresnet10	6.65 / 3077	13.63 / 1775
waifu2x cunet / cugan	3.69 / 6711	8.36 / 3591
waifu2x swin_unet ²	2.19 / 9791	4.53 / 5236
realesrgan ³	5.75 / 1961	11.57 / 959
rife ⁴	N/A	N/A

Also check the release notes of the v14.test and v14.test2 releases.

benchmark

RTX 4090
- processor clock @ 2520 MHz
Intel Icelake server @ 2100 MHz
Driver 551.86
Windows 10 21H2 (19044.1415)
TensorRT 9.2.0
VapourSynth-Classic R57.A8, vapoursynth-plugin v0.96g3

1920x1080 rgbs, CUDA graphs enabled, fp16

Measurements: FPS / Device Memory (MB)

general

model	1 stream	2 streams	3 streams
dpir gray	21.93/1757.352	25.48/3049.696	25.31/4342.044
dpir color	18.24/1790.184	25.11/3115.360	25.22/4440.540

waifu2x upconv_7_{anime_style_art_rgb, photo}	19.58/2148.716	39.87/3867.240	59.94/5585.768
waifu2x upresnet10	17.40/1655.144	34.22/2880.096	42.78/4105.048
waifu2x cunet / cugan	13.64/4391.292	25.09/8346.248	25.19/12301.208
waifu2x swin_unet	4.62/14989.772	OOM	OOM

real-esrgan (v2/v3, xsx2)	16.77/1136.996	33.99/1876.568	41.44/2616.140

rife

v2, fp16 i/o

version	1 stream	2 streams	3 streams	4 streams	5 streams
v4.4-v4.5	150.20/622.784	301.05/835.860	448.90/1053.024	615.84/1268.152	787.57/1481.224
v4.6	147.63/624.832	294.53/837.904	452.26/1055.072	603.63/1270.200	764.31/1485.320
v4.7-v4.9	132.06/747.712	268.63/1075.476	403.54/1405.284	494.98/1737.152	496.41/2064.908
v4.10-v4.15	119.09/862.400	238.68/1304.852	346.98/1749.352	349.48/2195.904	349.80/2638.356
{v4.12, v4.13, v4.15, v4.16}_lite	123.72/782.528	250.81/1151.252	377.27/1522.020	403.14/1894.844	403.79/2263.568
v4.14 lite	117.97/839.872	234.67/1265.940	320.23/1696.104	321.88/2124.224	321.18/2552.340

This pre-release uses trt 9.2.0 + cuda 12.3.1 + cudnn 8.9.6, which requires a minimum driver version of 525 and is compatible with 10 series and newer GPUs, with no significant performance improvement measured.
vsmlrt.py in all branches can be used interchangeably.

RDNA3, Navi 31, 12288 shaders, processor clock @ 2399 MHz, memory clock @ 1249 MHz, driver 6.0.32831, PCIe 4.0 x16, MIGraphX 2.8.0, ROCm 6.0.2, Linux 6.7.0-060700-generic, VapourSynth-Classic R57.A8 ↩
tested on MIGraphX 2.10.0 ROCm/AMDMIGraphX@ecd5adc, requiers MIGraphX 2.9.0 ROCm/AMDMIGraphX@2d4a6507c3ad41f9d7ea36de1d7fb257cc788585s and replacing edge padding by reflection padding ↩
tested on MIGraphX 2.10.0 ROCm/AMDMIGraphX@ecd5adc, requiers MIGraphX 2.9.0 ROCm/AMDMIGraphX@2d4a6507c3ad41f9d7ea36de1d7fb257cc788585s ↩
missing support for GridSample operation ↩

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v14.test3: latest TensorRT, MIGraphX backend

benchmark

general

rife

Contributors