Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIXED] Flash-Attn breaks with flash_attn_gpu #1437

Open
jmparejaz opened this issue Dec 17, 2024 · 9 comments
Open

[FIXED] Flash-Attn breaks with flash_attn_gpu #1437

jmparejaz opened this issue Dec 17, 2024 · 9 comments
Labels
fixed - pending confirmation Fixed, waiting for confirmation from poster

Comments

@jmparejaz
Copy link

I have

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:18:24_PDT_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0
Name: torch
Version: 2.4.1+cu124
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: [email protected]
License: BSD-3
Location: /usr/local/lib/python3.11/dist-packages
Requires: filelock, fsspec, jinja2, networkx, nvidia-cublas-cu12, nvidia-cuda-cupti-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-runtime-cu12, nvidia-cudnn-cu12, nvidia-cufft-cu12, nvidia-curand-cu12, nvidia-cusolver-cu12, nvidia-cusparse-cu12, nvidia-nccl-cu12, nvidia-nvjitlink-cu12, nvidia-nvtx-cu12, sympy, triton, typing-extensions
Required-by: accelerate, bitsandbytes, cut-cross-entropy, flash-attn, peft, torchaudio, torchvision, unsloth_zoo, xformers
Name: flash-attn
Version: 2.7.2.post1
Summary: Flash Attention: Fast and Memory-Efficient Exact Attention
Home-page: https://github.com/Dao-AILab/flash-attention
Author: Tri Dao
Author-email: [email protected]
License: 
Location: /usr/local/lib/python3.11/dist-packages
Requires: einops, torch
Required-by: 

installed unsloth with the wget command and still i got this warning message

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
Unsloth: Your Flash Attention 2 installation seems to be broken?
A possible explanation is you have a new CUDA version which isn't
yet compatible with FA2? Please file a ticket to Unsloth or FA2.
@shimmyshimmer
Copy link
Collaborator

Try uninstalling and reinstalling unsloth. If error still persists let us know
https://docs.unsloth.ai/get-started/install-update/updating

@shimmyshimmer shimmyshimmer added the unsure bug? I'm unsure label Dec 18, 2024
@jmparejaz
Copy link
Author

the error message is still happening

@R3xpook
Copy link

R3xpook commented Dec 28, 2024

I do have the same problem but with latest flash-attn version , with 2.7.0.post2 It works fine (wsl)

@hengck23
Copy link

hengck23 commented Dec 31, 2024

maybe edit this file:
lib/python3.11/site-packages/unsloth/models/_utils.py

i think there is no more flash_attn_cuda, instead we have flash_attn_2_cuda.
Please verify that the change is correct (e.g., run on some data and compute some metric), as I have not tried it.

   ##from flash_attn.flash_attn_interface import flash_attn_cuda
   from flash_attn.flash_attn_interface import flash_attn_gpu as flash_attn_cuda

@n9Mtq4
Copy link

n9Mtq4 commented Jan 4, 2025

I'm getting the same thing.

In Dao-AILab/flash-attention#1203 (specific location) flash_attn_cuda was renamed flash_attn_gpu which causes unsloth to think FA is broken.

For me I did a different workaround of re-adding flash_attn_cuda to flash attention as I wanted to make sure flash_attn_cuda was available as it might be imported elsewhere.

I just edited venv/lib/python3.12/site-packages/flash_attn/flash_attn_interface.py

USE_TRITON_ROCM = os.getenv("FLASH_ATTENTION_TRITON_AMD_ENABLE", "FALSE") == "TRUE"
if USE_TRITON_ROCM:
    from .flash_attn_triton_amd import interface_fa as flash_attn_gpu
else:
    import flash_attn_2_cuda as flash_attn_gpu
+   flash_attn_cuda = flash_attn_gpu

@Taimin
Copy link

Taimin commented Jan 15, 2025

Same here. Got a reminder about flash-attn not installed. I used pip install flash-attn --no-build-isolation to install the package. Solved with the workaround mentioned above.

@danielhanchen
Copy link
Contributor

Apologies a lot on this - and sorry I missed this entirely - I added a fix to the nightly branch - so sorry on the issue!

@danielhanchen danielhanchen added currently fixing Am fixing now! and removed unsure bug? I'm unsure labels Jan 16, 2025
@danielhanchen danielhanchen changed the title how to fix flash attention broken installation Flash-Attn breaks with flash_attn_gpu Jan 16, 2025
@weiminw
Copy link

weiminw commented Jan 17, 2025

+1

@danielhanchen
Copy link
Contributor

I just fixed it! Apologies on the delay!

For local machines, please update Unsloth via:

pip install --upgrade --no-cache-dir --force-reinstall --no-deps unsloth unsloth_zoo

@danielhanchen danielhanchen added fixed - pending confirmation Fixed, waiting for confirmation from poster and removed currently fixing Am fixing now! labels Jan 20, 2025
@danielhanchen danielhanchen changed the title Flash-Attn breaks with flash_attn_gpu [FIXED] Flash-Attn breaks with flash_attn_gpu Jan 20, 2025
@danielhanchen danielhanchen pinned this issue Jan 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fixed - pending confirmation Fixed, waiting for confirmation from poster
Projects
None yet
Development

No branches or pull requests

8 participants