Releases: chengzeyi/stable-fast
Releases · chengzeyi/stable-fast
v0.0.12
What's Changed
- add _jit_pass_eliminate_simple_arith by @chengzeyi in #45
- Dev by @chengzeyi in #46
- Dev by @chengzeyi in #47
- Dev by @chengzeyi in #48
Full Changelog: v0.0.11...v0.0.12
v0.0.11
What's Changed
- Bump version to 0.0.11 and use fixed CUDNN major version for CI by @chengzeyi in #35
- remove triton.autotune to make compilation faster by @chengzeyi in #36
- Feature/quantization by @chengzeyi in #43
- Dev by @chengzeyi in #44
Full Changelog: v0.0.10...v0.0.11
v0.0.10
What's Changed
- Dev by @chengzeyi in #28
- Preserve parameters by @skirsten in #27
- Dev by @chengzeyi in #30
- Dev by @chengzeyi in #33
- close #32, add env var WITHOUT_CUDA to setup.py, fail if CUDA is not available and WITHOUT_CUDA is not set by @chengzeyi in #34
- Fix missing linking CUDNN、CUBLAS and CUDA in previous CI wheels😭
New Contributors
Full Changelog: v0.0.9...v0.0.10
v0.0.9
What's Changed
- Build automated CI to publish binary wheels on Linux and Windows
- fix use_count of mempool becoming zero by @chengzeyi in #23
- Dev by @chengzeyi in #24
Full Changelog: v0.0.8...v0.0.9
v0.0.8 release
What's Changed
- Efficient mem cuda graph by @chengzeyi in #18
- Bug Fixes by @chengzeyi in #20
Full Changelog: v0.0.7...v0.0.8
v0.0.7 release
Various improvements:
- Implement flat_tensors: A complete solution to convert arbitrary python objects to a list of PyTorch tensors, making JIT trace more flexible.
- Add more fuse passes to improve performance on ComfyUI.
v0.0.6 release
Fix acquiring unreachable GIL when process exits
v0.0.5 release
Disable CUDA Graph for SDXL
v0.0.4 release
Many bug fixes and improvements:
- Support SDXL
- Support CUDA Graph with dynamic shape
- Support development version of Triton
- Fix crash when process exits because of missing GIL
v0.0.3 release
Bug fixes:
- Fix compilation failure when Triton is not enabled.
- Fix wrong output in Triton NCHW GroupNorm kernel.