Release v9.1 · AmusementClub/vs-mlrt

Bugfix release for v9. Recommended update for v9 users.
Please see release notes for v9 to see all the major new features.

Fix ort_cuda fp16 inference for CUGAN(version=2) model.

A new parameter fp16_blacklist_ops is introduced in ort and ov backends for other issues possibly related to reduced precision.

Please still carefully review the output of fp16 accelerated CUGAN(version=2).
Conform with CUGAN(version=2)'s dynamic range compression. This feature is enabled by setting conformance=True (which is the default) in the CUGAN wrapper in vsmlrt.py, and it's implemented as:
```
clip = clip.std.Expr("x 0.7 * 0.15 +")
clip = CUGAN(clip, version=2)
clip = clip.std.Expr("x 0.15 - 0.7 /")
```

Known issues

These two issues are fixed in the v9.2 release.
- The ORT_CUDA backend allocates memory during inference. This degrades performance and may results in out of memory error.
- Parameter use_cuda_graph of the ORT_CUDA backend is broken on Windows.

Full Changelog: v9...v9.1