-
Notifications
You must be signed in to change notification settings - Fork 643
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(colab notebook) Train DALLE-pytorch on C@H #291
Comments
Hi i messaged you on discord but you seemed to be busy anyways i have an problem where its stuck at 'Time to load sparse_attn op:'. no matter what params i use it used to work now it takes 10 minutes+ is this a bug or a simple mistake from me? |
And btw I'm valteralfred. @afiaka87 |
Hey! I've seen this bug before I think. You need to delete a folder containing the precompiled pytorch extensions. I want to say it's in the /root/.cache/torch_extensions directory but am on mobile and can't check currently. |
Thanks ill try that if it does not work ill try something else. |
It seems to have fixed it self! thanks for the help. |
I think the cache got cleaned |
Anyone coming here from the notebook - I'm not really on the discord as often as I should be. File issues with the notebook here if you can or I'm not as likely to see them. I believe the issue here is that pytorch or deepspeed or something gets stuck trying to compile an extension. When in doubt; restart the kernel on your notebook. You won't lose your instance - it'll just clear any local state you have currently. Then you can re-run the cell you were on before; no need to re-run the setup cells. |
Hi there, Trying the collar notebook for the first time. It gets stuck at the installation of NVIDIA apex. It seems that the
|
I updated the colab notebook recently to train with the crawling @ home dataset. Hopefully fixed some of these issues. |
@afiaka87 Hi, thanks for your sharing. I am using the afiaka dalle generation colab.https://colab.research.google.com/drive/11V2xw1eLPfZvzW8UQyTUhqCEU71w6Pr4?usp=sharing#scrollTo=682c5804-5f97-469f-8cf1-1cc8356591b8. Got several bugs I don't know how to fix: Also find related issue here: robvanvolt/DALLE-models#13 but no one fixed yet. |
This has to do with Deep speed dropping support for a lot of GPUs with its sparse attention cuda code. I don't believe they are likely to work soon regrettably as I can no longer run them locally either. |
https://gist.github.com/afiaka87/b29213684a1dd633df20cab49d05209d
If there are any bugs - please make a comment below. When in doubt; restart your kernel. Tends to fix things a lot.
The text was updated successfully, but these errors were encountered: