How to quantize a custom Flux model? For example, lightweight models like Flux Lite that have removed some double blocks. #13

kelisiya · 2024-11-11T03:38:32Z

Will there be more general scripts provided in the future to support various custom Flux models?

lmxyy · 2024-11-12T05:15:06Z

Yeah, the quantization library is at mit-han-lab/deepcompressor. We are also cleaning our LoRA conversion scripts and will release the instructions soon on how to support customized LoRA.

kelisiya · 2024-11-12T09:22:24Z

Yeah, the quantization library is at mit-han-lab. We are also cleaning our LoRA conversion scripts and will release the instructions soon on how to support customized LoRA.

when I run example.py .
[2024-11-12 09:20:39.140] [info] Initializing QuantizedFluxModel [2024-11-12 09:20:39.384] [info] Loading weights from /data3/home/research/FLUX_train/nunchaku/model/svdq-int4-flux.1-dev.safetensors [2024-11-12 09:20:40.235] [info] Done. 0%| | 0/28 [00:00<?, ?it/s] Traceback (most recent call last): File "/data3/home/research/FLUX_train/nunchaku/example.py", line 10, in <module> image = pipeline("A cat holding a sign that says hello world", num_inference_steps=28, guidance_scale=0).images[0] File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/diffusers/pipelines/flux/pipeline_flux.py", line 730, in __call__ noise_pred = self.transformer( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/diffusers/models/transformers/transformer_flux.py", line 500, in forward encoder_hidden_states, hidden_states = block( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/data3/home/research/FLUX_train/nunchaku/nunchaku/models/flux.py", line 51, in forward hidden_states = self.m.forward( RuntimeError: CUDA error: no kernel image is available for execution on the device (at /data3/home/research/FLUX_train/nunchaku/src/kernels/awq/gemv_awq.cu:311)

my cuda version:
Copyright (c) 2005-2023 NVIDIA Corporation Built on Wed_Nov_22_10:17:15_PST_2023 Cuda compilation tools, release 12.3, V12.3.107 Build cuda_12.3.r12.3/compiler.33567101_0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to quantize a custom Flux model? For example, lightweight models like Flux Lite that have removed some double blocks. #13

How to quantize a custom Flux model? For example, lightweight models like Flux Lite that have removed some double blocks. #13

kelisiya commented Nov 11, 2024

lmxyy commented Nov 12, 2024 •

edited

Loading

kelisiya commented Nov 12, 2024 •

edited

Loading

How to quantize a custom Flux model? For example, lightweight models like Flux Lite that have removed some double blocks. #13

How to quantize a custom Flux model? For example, lightweight models like Flux Lite that have removed some double blocks. #13

Comments

kelisiya commented Nov 11, 2024

lmxyy commented Nov 12, 2024 • edited Loading

kelisiya commented Nov 12, 2024 • edited Loading

lmxyy commented Nov 12, 2024 •

edited

Loading

kelisiya commented Nov 12, 2024 •

edited

Loading