-
Notifications
You must be signed in to change notification settings - Fork 2.2k
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasSgemm #5064
Comments
I should also note that I've seen this when running the sample on GTX TITAN X (so it doesn't seem like an issue that's isolated to that particular GPU / that particular GPU has gone bad). I could not reproduce the issue when running with a Titan RTX. |
This error doesn't seem to occur if I use the HF Transformer |
@AkshitaB i saw you self-assigned this issue---did you get a chance to see if you can repro on titan XP with the above docker image? |
I assume this also doesn't happen when you run on CPU? Usually you get better error messages on CPU. |
We don't have any Titans anymore outside the vision team. I'm trying to reproduce it on allennlp-server4, which has Quadro RTX. |
@nelson-liu, if I don't reproduce it, can you set the If this is a general problem with AllenNLP, it becomes my top priority. But I'm not keen on debugging a specific bug in CUDA that only occurs with a specific combination of CUDA and GPU. Can you run an older version of CUDA, or on a different GPU, to get around this problem? |
I think that should work...my suspicion is that it'll fail if the compute capability is 6.1 or below ( https://developer.nvidia.com/cuda-gpus ). It works on a Titan RTX, and Quadro RTX's have the same compute capability of 7.5
I managed to reproduce this issue on Titan XP, K40, GTX Titan X. I haven't tried any other GPUs, but I'll gather some more data and also try running with CUDA_LAUNCH_BLOCKING.
So the context here is that I'm in a heterogeneous cluster environment, where the ndoes have anythign between a Titan XP (majority of nodes), K40, GTX Titan X, Titan RTX, RTX 3080, Titan V, or 2080 Ti. It'd be great if i could run my jobs on any node (would certainly speed things up). More generally, a lot of users are still on these older GPUs, and I just wanted to see if it was reproducible outside of my organization. |
It's independent of the CUDA version then? Just depends on the compute capability? |
Ah, the issue is that CUDA 11+ is the only CUDA version (with a pytorch release) that supports all of these GPUs (hence why I'm using it in particular). I haven't tried CUDA 10.2, I'll give it a shot. |
Another possibly-relevant hint: sometimes, the error I get out is the CUDA 11.1 release notes mention a fixed issue in cublas: Given that others are seeing this as well ( pytorch/pytorch#53957 ), maybe this is just an issue with PyTorch. It'd be nice to hear whether you're able to reproduce it on a Titan XP or older GPU, though, just so we can perhaps verify that that's the case. |
I'm trying this now. |
Turns out I don't have a server that has a card this old but an Nvidia driver recent enough to run CUDA 11. |
Ah, no worries then. Thanks for looking into it regardless, hopefully there's more info on the PyTorch upstream side. I'll close this for now, maybe it'll be useful for wayward google wanderers. |
I've asked about getting the drivers updated. If I hear more, I'll let you know. Also, if this problem pops up in other context, let us know. If it's something we can fix, or at least work around, we should fix it. |
FWIW, this seems to be fixed with the 1.9.0 nightly. |
Took a while, but I'm always super happy when PyTorch fixes a bug for us :-) |
I had the same issue with 1.8.0, CUDA 11.1. I'd like to also reassure people like me that updating to 1.9.0 nightly fixed the issue. |
Checklist
master
branch of AllenNLP.pip freeze
.Description
When I train RoBERTa (or BERT, but let's just stick with RoBERTa in this issue in the interest of simplicity) on MNLI, I get an odd CUDA error.
Environment
I made a docker image that reproduces this issue at: https://hub.docker.com/r/nfliu/torch1.8.0-sgemm-execution-debugging . The associated dockerfile is https://gist.github.com/nelson-liu/f80d76f5557d48f2a52b2082b1bf86da . In short, it is based off of the NVIDIA cuda 11.1 container, and installs allennlp and allennlp-models off the most recent commits, and also pytorch 1.8.0+cu111. The python is python 3.7
Here's the output of nvidia-smi (for things like driver version, etc)
Steps to reproduce
nvidia-docker run --rm -it nfliu/torch1.8.0-sgemm-execution-debugging
allennlp train https://gist.githubusercontent.com/nelson-liu/2164bb51097c5a8f9f9e8 d7784f8473e/raw/ce93da75558489177556355c8d54ca4949417b8b/roberta_base_mnli.jsonnet -s output
The config is at https://gist.github.com/nelson-liu/2164bb51097c5a8f9f9e8d7784f8473e , it's exactly the same as the RoBERTa MNLI config except I'm using RoBERTa base and a batch size of 8, since the titanxp has a bit less memory.
The text was updated successfully, but these errors were encountered: