-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OOM in "def adapt" #21
Comments
Hi Sunshower76, I have the same problem and the same concern here. The memory in the GPU progressively increases when I improve the "num_adaptation_steps". I agree with you that changing this parameter should not increase the GPU memory. Hope the author could have a better solution. |
Hi! When you say that the GPU memory progressively increases, is that over the course of training (e.g. it works fine at first, but crashes after a number of iterations), or is that at the very first training iteration? If it is over the course of training this might be the sign of a memory leak (I don't know where it could come from though), but if it crashes at the very beginning of training because you increased One option if you'd like to increase |
Hi tristandeleu, thanks for your reply.
Previously, I thought the number of gradient updates has nothing to do with the memory usage considering the regular DNN model. However, this is meta learning which is a bilevel optimization so this might not be the case I thought. I need to think about this deeper according to your comments. Thanks. |
@tristandeleu @Hugo101 I changed the code as below in the adapt function:
Because, I found that the ids of the variable "params" and output from "gradient_update_paramters" function are diffrent. |
Hi @sunshower76 , I think I check the difference you mentioned between
As you mentioned, this is not a main issue. I think this issue also talked about the batch normalization thing: #19 (comment) |
note the issues you reference these I mostly closed it. For meta-learning batch stats during evaluation is what should be done, so the flag should be .train() + not tracking running stats. It's subtle and confusing and recommend you to see that issues and extended discussions and links it links to to understand BN. |
Hi @brando90 , thanks for your comments. I have the same confusion with you regarding BN in the current MAML code. Thanks for the reference from stackflow. |
Hi!
In the function , which name is "def adapt" that indicates inner loop in the maml, the memory in the GPU progressively increase following "num_adaptation_steps" is increase.
Finally, it makes the code stop.
I think the params are not need to be accumulated. The model parameters are need to be copied for each task.
so, i think this GPU increasing is not proper.
Is this problem caused by my code or from the library?
thanks:)
The text was updated successfully, but these errors were encountered: