CPU memory keeps increasing (0.1GiB/10 seconds) GPUs memory are stable at 23G out of 48G #16218
johngrabner
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I set trainer = pl.Trainer( gpus=[0,1], ... limit_train_batch=0.2 ...) so I can complete a partial epoch.
Well, this worked until yesterday. My database has grown, and now this does not work.
Python version = 3.8.12 (default, Oct 12 2021, 13:49:34)
[GCC 7.5.0]
torch.version = 1.11.0
pl__version__ 1.5.10
Any suggestions for how to debug this memory leak?
Beta Was this translation helpful? Give feedback.
All reactions