-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When running, the error: CUDA error out of memory #17
Comments
Sorry, trouble you, again. It means its embeddings is non-trainable. Thank you. |
It would affect the results. Can you tell me the batch size and other related hyperparameters? Also, can you run it with pointwise False? |
Hi, batch size = 4000, epoches = 100, other related hyperparameters are the same as the source. In specific: [lcquad] I run command line: |
I think there is happening because the file is trying to load another slot pointer instance while there is already one slot pointer instance in the memory. This will not affect the final result much as the best performing model (one with the highest validation accuracy) gets stored in the disk. I have highlighted the best accuracy result in the image. You can run onefile.py with appropriate params to load the model and re-run the eval. I will also recommend to run it for a little longer epoch as it looks like the model has not converged. |
ok, I have a try. But I have a question: how much epoch should be set ? thanks |
I find 300 epochs in the paper. I have a try it. |
Hi, sorry to trouble you again.
When I run : CUDA_VISIBLE_DEVICES=1 python corechain.py -model slotptr -device cuda -dataset lcquad -pointwise True
The error line: loss.backward()
My GPU memory : 10G.
Thank you for your help.
The text was updated successfully, but these errors were encountered: