Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flux FP8 with optimum.quanto TypeError: WeightQBytesTensor.__new__() missing 6 required positional arguments: 'axis', 'size', 'stride', 'data', 'scale', and 'activation_qtype' #10526

Open
nitinmukesh opened this issue Jan 10, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@nitinmukesh
Copy link

Describe the bug

Flux FP8 model with optimum.quanto

pipe.enable_model_cpu_offload() - Works
pipe.enable_sequential_cpu_offload() - Doesn't work

Reproduction

import torch
from diffusers import FluxTransformer2DModel, FluxPipeline
from transformers import T5EncoderModel, CLIPTextModel
from optimum.quanto import freeze, qfloat8, quantize

bfl_repo = "black-forest-labs/FLUX.1-dev"
dtype = torch.bfloat16

transformer = FluxTransformer2DModel.from_single_file("https://huggingface.co/Kijai/flux-fp8/blob/main/flux1-dev-fp8.safetensors", torch_dtype=dtype)
quantize(transformer, weights=qfloat8)
freeze(transformer)

text_encoder_2 = T5EncoderModel.from_pretrained(bfl_repo, subfolder="text_encoder_2", torch_dtype=dtype)
quantize(text_encoder_2, weights=qfloat8)
freeze(text_encoder_2)

pipe = FluxPipeline.from_pretrained(bfl_repo, transformer=None, text_encoder_2=None, torch_dtype=dtype)
pipe.transformer = transformer
pipe.text_encoder_2 = text_encoder_2

# pipe.enable_model_cpu_offload()
pipe.enable_sequential_cpu_offload()

prompt = "A cat holding a sign that says hello world"
image = pipe(
    prompt,
    guidance_scale=3.5,
    output_type="pil",
    num_inference_steps=20,
    generator=torch.Generator("cpu").manual_seed(0)
).images[0]

image.save("flux-fp8-dev.png")

Logs

(venv) C:\ai1\diffuser_t2i>python FLUX_FP8_optimum-quanto.py
Downloading shards: 100%|███████████████████████████████████████████████████| 2/2 [00:00<?, ?it/s]
Loading checkpoint shards: 100%|████████████████████████████████████| 2/2 [00:01<00:00,  1.25it/s]
Loading pipeline components...:  60%|██████████████████▌            | 3/5 [00:00<00:00,  4.05it/s]You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Loading pipeline components...: 100%|███████████████████████████████| 5/5 [00:01<00:00,  3.27it/s]
Traceback (most recent call last):
  File "C:\ai1\diffuser_t2i\FLUX_FP8_optimum-quanto.py", line 22, in <module>
    pipe.enable_sequential_cpu_offload()
  File "C:\ai1\diffuser_t2i\venv\lib\site-packages\diffusers\pipelines\pipeline_utils.py", line 1180, in enable_sequential_cpu_offload
    cpu_offload(model, device, offload_buffers=offload_buffers)
  File "C:\ai1\diffuser_t2i\venv\lib\site-packages\accelerate\big_modeling.py", line 204, in cpu_offload
    attach_align_device_hook(
  File "C:\ai1\diffuser_t2i\venv\lib\site-packages\accelerate\hooks.py", line 512, in attach_align_device_hook
    attach_align_device_hook(
  File "C:\ai1\diffuser_t2i\venv\lib\site-packages\accelerate\hooks.py", line 512, in attach_align_device_hook
    attach_align_device_hook(
  File "C:\ai1\diffuser_t2i\venv\lib\site-packages\accelerate\hooks.py", line 512, in attach_align_device_hook
    attach_align_device_hook(
  [Previous line repeated 4 more times]
  File "C:\ai1\diffuser_t2i\venv\lib\site-packages\accelerate\hooks.py", line 503, in attach_align_device_hook
    add_hook_to_module(module, hook, append=True)
  File "C:\ai1\diffuser_t2i\venv\lib\site-packages\accelerate\hooks.py", line 161, in add_hook_to_module
    module = hook.init_hook(module)
  File "C:\ai1\diffuser_t2i\venv\lib\site-packages\accelerate\hooks.py", line 308, in init_hook
    set_module_tensor_to_device(module, name, "meta")
  File "C:\ai1\diffuser_t2i\venv\lib\site-packages\accelerate\utils\modeling.py", line 368, in set_module_tensor_to_device
    new_value = param_cls(new_value, requires_grad=old_value.requires_grad).to(device)
TypeError: WeightQBytesTensor.__new__() missing 6 required positional arguments: 'axis', 'size', 'stride', 'data', 'scale', and 'activation_qtype'

System Info

Make sure to merge locally 365/head and https://github.com/huggingface/optimum-quanto/pull/366/files

Windows 11

(venv) C:\ai1\diffuser_t2i>python --version
Python 3.10.11

(venv) C:\ai1\diffuser_t2i>echo %CUDA_PATH%
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6
(venv) C:\ai1\diffuser_t2i>pip list
Package            Version
------------------ ------------
accelerate         1.1.0.dev0
bitsandbytes       0.45.0
diffusers          0.33.0.dev0
gguf               0.13.0
numpy              2.2.1
optimum-quanto     0.2.6.dev0
torch              2.5.1+cu124
torchao            0.7.0
torchvision        0.20.1+cu124
transformers       4.47.1

Who can help?

No response

@nitinmukesh nitinmukesh added the bug Something isn't working label Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant