You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ERROR: CUDA RT call "cudaFuncSetAttribute(&monarch_conv_cuda_32_32_32_kernel<32, 8, 32768, 2, 16, false, 2, 8, 8>, cudaFuncAttributeMaxDynamicSharedMemorySize, 135168)" in line 969 of file /root/flash-fft-conv/csrc/flashfftconv/monarch_cuda/monarch_cuda_interface_fwd_bf16.cu failed with invalid argument (1). CUDA Runtime Error at: /root/flash-fft-conv/csrc/flashfftconv/monarch_cuda/monarch_cuda_interface_fwd_bf16.cu:1041 invalid argument
Tried the example code with my_flashfftconv(x, k) and tests/test_flashfftconv.py using the Nvidia PyTorch docker container (23.05). Previously, I used conda with different CUDA versions (12.1, 12.2 and 12.3).
I'm using two NVIDIA RTX 3090 with Driver Version: 535.129.03 and CUDA Version: 12.2
Is there any fix for this problem? (changing tensor types didn't fixed)
The text was updated successfully, but these errors were encountered:
Thanks for this bug report! This is because the RTX series has less SRAM than A100/H100 (99 KB vs. 163/227 KB), which I didn't check for during development. You should be good for now for sequences up to 16K, and sequence lengths between 64K and 524K.
We'll try to fill in the rest of the sequence lengths for 3090 & 4090 in the next week or so, up to 2M (it requires some code changes and special-casing for different GPUs).
ERROR: CUDA RT call "cudaFuncSetAttribute(&monarch_conv_cuda_32_32_32_kernel<32, 8, 32768, 2, 16, false, 2, 8, 8>, cudaFuncAttributeMaxDynamicSharedMemorySize, 135168)" in line 969 of file /root/flash-fft-conv/csrc/flashfftconv/monarch_cuda/monarch_cuda_interface_fwd_bf16.cu failed with invalid argument (1). CUDA Runtime Error at: /root/flash-fft-conv/csrc/flashfftconv/monarch_cuda/monarch_cuda_interface_fwd_bf16.cu:1041 invalid argument
Tried the example code with my_flashfftconv(x, k) and tests/test_flashfftconv.py using the Nvidia PyTorch docker container (23.05). Previously, I used conda with different CUDA versions (12.1, 12.2 and 12.3).
I'm using two NVIDIA RTX 3090 with Driver Version: 535.129.03 and CUDA Version: 12.2
Is there any fix for this problem? (changing tensor types didn't fixed)
The text was updated successfully, but these errors were encountered: