Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quantized Flux not working #2511

Open
donkey-donkey opened this issue Sep 28, 2024 · 9 comments
Open

Quantized Flux not working #2511

donkey-donkey opened this issue Sep 28, 2024 · 9 comments

Comments

@donkey-donkey
Copy link

hi. gettin an error. on my ADA RTX 4000 machine that supports BF16 and that runs Stable Diffusion just fine. I get an error on the quantized FLUX update.

running with no model specified or dev or schnell

cargo run --features cuda,cudnn --example flux -r --  --height 1024 --width 1024     --prompt "a rusty robot walking on a beach holding a small torch, the robot has the word "rust" written on it, high quality, 4k" --model dev

error

Tensor[[1, 256], u32, cuda:0]
Error: DriverError(CUDA_ERROR_NOT_FOUND, "named symbol not found") when loading is_u32_bf16

any ideas?

@LaurentMazare
Copy link
Collaborator

This error is most likely not due to the model itself but rather to the cuda setup. The bf16 kernels are predicated by the following line:

#if __CUDA_ARCH__ >= 800
...
#endif

This makes the kernels only available when the cuda arch set up by the nvcc compiler is above 8 so it's likely not the case in your setup. It would be interesting to see which value __CUDA_ARCH__ has in your case, as well as the output of the nvidia-smi --query-gpu=compute_cap --format=csv command.

@donkey-donkey
Copy link
Author

this machine has 2 GPUs. When I run the Stable diffusion Examples it uses the ADA 4000 with a 8.9 compute cap.

$ nvidia-smi --query-gpu=compute_cap --format=csv
compute_cap
6.1
8.9

how do i see the value of CUDA_ARCH

@LaurentMazare
Copy link
Collaborator

That first gpu is most likely creating the issue, did you trying using CUDA_VISIBLE_DEVICES so that candle can only see the second gpu (if you're not familiar with it, it's not a candle specific thing so you can just google to find the way to use it).

@super-fun-surf
Copy link

super-fun-surf commented Sep 30, 2024

When CUDA_VISIBLE_DEVICES is set to the correct device and i can see that the correct GPU is used in nvidia-smi -l 1 realtime monitoring it gets up to about 9GB of memory used and then the same error happens in the middle of the image process

    Running `target/release/examples/flux --height 1024 --width 1024 --prompt 'a rusty robot walking on a beach holding a small torch, the robot has the word rust written on it, high quality, 4k' --quantized`
[[    3,     9,     3,  9277,    63,  7567,  3214,    30,     3,     9,  2608,
   3609,     3,     9,   422, 26037,     6,     8,  7567,    65,     8,  1448,
      3,  9277,  1545,    30,    34,     6,   306,   463,     6,   314,   157,
      1,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
      0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
      0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
      0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
      0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
      0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
      0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
      0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
      0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
      0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
      0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
      0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
      0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
      0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
      0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
      0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
      0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
      0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
      0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
      0,     0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
      0,     0,     0]]
Tensor[[1, 256], u32, cuda:0]
Error: DriverError(CUDA_ERROR_NOT_FOUND, "named symbol not found") when loading is_u32_bf16

@LaurentMazare
Copy link
Collaborator

Probably good to clean your target directory in case there are some PTX files that are cached and didn't rebuild after setting CUDA_VISIBLE_DEVICES.

@super-fun-surf
Copy link

I did a clean and its the same error.
I can monitor it in nvidia-smi which card is being used. The older card simply runs out of memory right away.

not sure where to look next.

@LaurentMazare
Copy link
Collaborator

Hum seems weird that candle can use the older card if CUDA_VISIBLE_DEVICES points only at the new one, it's supposed to be handled in the cuda framework and so not something that candle could bypass. Maybe you're pointing at the wrong device somehow?
Another option would be in the code to point at the cuda device 1 rather than the cuda device 0.

@super-fun-surf
Copy link

Actually it is pointing to the right card. it's using the correct card. CUDA_VISIBLE_DEVICES works as it should, there is no problem there. it's using the correct card and crashing.
again there is no problem in choosing the correct card. As I said above I can monitor it in nviida-smi and it is using the right card.
On the correct card it still crashes with the error:

Error: DriverError(CUDA_ERROR_NOT_FOUND, "named symbol not found") when loading is_u32_bf16

@super-fun-surf
Copy link

just checking back in. no idea how to troubleshoot this.
so it works on the A100 but still getting the same error on my ADA RTX 4000 with 20GB and compute cap 89

DriverError(CUDA_ERROR_NOT_FOUND, "named symbol not found") when loading is_u32_bf16

and on Mac M1 we get error

  Error while loading function: "Function 'cast_f32_bf16' does not exist"))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants