-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fisher Update causing errors #2
Comments
detach() won't work because then pytorch cannot track gradients for that tensor. So I see that the problem could be due to the fact that I am saving the probabilities for entire dataset and then taking the mean. A simple solution to your problem would be to keep running mean of the liklihoods and not store them in list as I am doing. Let me know if this works for you. |
Oh, Will, that be the correct implementation of the fisher matrix? Can you please share a line of code about what you are saying so that I can make sure that I am doing the right thing. |
instead of this Overcoming-Catastrophic-forgetting-in-Neural-Networks/elastic_weight_consolidation.py Line 30 in e056e6d
can you try dl = DataLoader(current_ds, batch_size, shuffle=True)
log_liklihoods = 0
for i, (input, target) in enumerate(dl):
if i > num_batch:
break
output = F.log_softmax(self.model(input), dim=1)
log_liklihoods = ((i+1) * log_liklihoods + output[:, target]) / (i+2) I am not sure if this is the actual issue on further thoughts as your computation graph will still keep on getting large (you will have to then flush the liklihood (using detach) after every batch in similar way). However give it a try and let me know if this works |
def save_fisher(fim: Dict[str, Tensor], name, scale=3): def fim_diag(model: Module,
What do you think about this implementation? |
Shouldn't instead of
We have
? Assume target size (batch size) is 64 and output is 64x4 (4 classes), output[:, target] gives me a 64x64 tensor while the intention is to get a 64x1 tensor, right? The alternative line does that. Great work BTW. |
Definitely right! |
I am trying to run EWC on my dataset with resnet50 model. While updating the fisher matrix using your function, My code says Cuda out of memory due to "log_liklihoods.append(output[:, target])" in the code. I read this "https://stackoverflow.com/questions/59805901/unable-to-allocate-gpu-memory-when-there-is-enough-of-cached-memory" and figured out the problem using 'detach()'. After doing detach etc, I get an error: RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.To further solve this, I set "allow_unused=True" in autograd. As a result, all my gradients go to 0. Why is this happening?
The text was updated successfully, but these errors were encountered: