Skip to content

v14.test3: latest TensorRT, MIGraphX backend

Pre-release
Pre-release
Compare
Choose a tag to compare
@github-actions github-actions released this 03 Dec 01:13
· 188 commits to master since this release

This is a preview release for TensorRT 9.2.0, following the v14.test and v14.test2 releases.

  • Same as those releases, it requires Pascal GPUs or later (10 series+) and driver version >= 525. Support for Kepler 2.0 and Maxwell GPUs is dropped.

  • TensorRT 9.2.0 is officially documented as for Large Language Models (LLMs) on NVIDIA A100, A10G, L4, L40, L40S, H100 GPUs, and NVIDIA GH200 Grace Hopper™ Superchip only. The Windows build is downloaded from here, and can be used on other GPU models.

  • Users should use the same version of TensorRT as provided (9.2.0) because runtime version checking is disabled in this release.

  • Added support for AnimeJaNai V3 models, contributed by contributed by @hooke007 in #82.

  • Added support for RIFE v4.13 ~ v4.16 (lite, ensemble) models, which are also available for previous vs-mlrt releases (simply download the new model file here and update vsmlrt.py).

    • The v4.13 ~ v4.15 models should have the same execution speed as the v4.10 - v4.12 models.
    • The v4.13 lite model, the v4.15 lite model and the v4.16 lite model should all have the same execution speed as the v4.12 lite model, while the v4.14 lite model may run slower.
  • Added support for fractional video frame interpolation in RIFE.

    • Playback in video players should also set video_player=True (#59 (comment)). This change is experimental.
  • Fixed an issue that causes the TRT backend crashes during script realoading. (#65). It is also fixed in the latest iteration of the v14.test2 release.

  • RIFE v4.7+ models with v2 representation are not working with dynamic shapes (#72). This has been reported to TensorRT developers.

  • Initial MIGraphX support (experimental) for AMD GPUs.

    • fp16 I/O contributed by @abihf in #86.
    • Multi-stream execution, device selection, hip graphs and dynamic shapes are not explicitly supported for now.

    preliminary benchmark on Radeon RX 7900 XTX 1:

    • resolution: 1920x1080
    • measurements: fps / device memory (MB)
    model fp32 fp16
    dpir gray 2.33 / 2829 7.29 / 1702
    dpir color 2.27 / 2861 7.03 / 1734
    waifu2x upconv7 6.31 / 4540 12.90 / 2503
    waifu2x upresnet10 6.65 / 3077 13.63 / 1775
    waifu2x cunet / cugan 3.69 / 6711 8.36 / 3591
    waifu2x swin_unet 2 2.19 / 9791 4.53 / 5236
    realesrgan 3 5.75 / 1961 11.57 / 959
    rife 4 N/A N/A
  • Also check the release notes of the v14.test and v14.test2 releases.


benchmark

  • RTX 4090
    • processor clock @ 2520 MHz
  • Intel Icelake server @ 2100 MHz
  • Driver 551.86
  • Windows 10 21H2 (19044.1415)
  • TensorRT 9.2.0
  • VapourSynth-Classic R57.A8, vapoursynth-plugin v0.96g3

1920x1080 rgbs, CUDA graphs enabled, fp16

Measurements: FPS / Device Memory (MB)

general

model 1 stream 2 streams 3 streams
dpir gray 21.93/1757.352 25.48/3049.696 25.31/4342.044
dpir color 18.24/1790.184 25.11/3115.360 25.22/4440.540
waifu2x upconv_7_{anime_style_art_rgb, photo} 19.58/2148.716 39.87/3867.240 59.94/5585.768
waifu2x upresnet10 17.40/1655.144 34.22/2880.096 42.78/4105.048
waifu2x cunet / cugan 13.64/4391.292 25.09/8346.248 25.19/12301.208
waifu2x swin_unet 4.62/14989.772 OOM OOM
real-esrgan (v2/v3, xsx2) 16.77/1136.996 33.99/1876.568 41.44/2616.140

rife

v2, fp16 i/o

version 1 stream 2 streams 3 streams 4 streams 5 streams
v4.4-v4.5 150.20/622.784 301.05/835.860 448.90/1053.024 615.84/1268.152 787.57/1481.224
v4.6 147.63/624.832 294.53/837.904 452.26/1055.072 603.63/1270.200 764.31/1485.320
v4.7-v4.9 132.06/747.712 268.63/1075.476 403.54/1405.284 494.98/1737.152 496.41/2064.908
v4.10-v4.15 119.09/862.400 238.68/1304.852 346.98/1749.352 349.48/2195.904 349.80/2638.356
{v4.12, v4.13, v4.15, v4.16}_lite 123.72/782.528 250.81/1151.252 377.27/1522.020 403.14/1894.844 403.79/2263.568
v4.14 lite 117.97/839.872 234.67/1265.940 320.23/1696.104 321.88/2124.224 321.18/2552.340

  • This pre-release uses trt 9.2.0 + cuda 12.3.1 + cudnn 8.9.6, which requires a minimum driver version of 525 and is compatible with 10 series and newer GPUs, with no significant performance improvement measured.
  • vsmlrt.py in all branches can be used interchangeably.
  1. RDNA3, Navi 31, 12288 shaders, processor clock @ 2399 MHz, memory clock @ 1249 MHz, driver 6.0.32831, PCIe 4.0 x16, MIGraphX 2.8.0, ROCm 6.0.2, Linux 6.7.0-060700-generic, VapourSynth-Classic R57.A8

  2. tested on MIGraphX 2.10.0 ROCm/AMDMIGraphX@ecd5adc, requiers MIGraphX 2.9.0 ROCm/AMDMIGraphX@2d4a6507c3ad41f9d7ea36de1d7fb257cc788585s and replacing edge padding by reflection padding

  3. tested on MIGraphX 2.10.0 ROCm/AMDMIGraphX@ecd5adc, requiers MIGraphX 2.9.0 ROCm/AMDMIGraphX@2d4a6507c3ad41f9d7ea36de1d7fb257cc788585s

  4. missing support for GridSample operation