You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jan 3, 2023. It is now read-only.
I have trained neon for librispeech data. But it's always killed because OOM. My machine has 24GB memory and GeForce GTX 1070 card of 8G memory.
I found this msg by dmsg
[3017506.733819] Out of memory: Kill process 25635 (python) score 974 or sacrifice child
[3017506.736861] Killed process 25635 (python) total-vm:55518724kB, anon-rss:23902876kB, file-rss:154436kB
is neon leaking memory or it require more memory to train?
The command I run is:
python train.py --manifest train:/bigdata/lili/deepspeech/librispeech/train-clean-100/train-manifest.csv --manifest val:/bigdata/lili/deepspeech/librispeech/train-clean-100/val-manifest.csv -e 20 -z 16 -s models -b gpu
The text was updated successfully, but these errors were encountered:
I changed batch_size to 8 but it's still killed.
[3256824.391743] Killed process 9666 (python) total-vm:53893188kB, anon-rss:23892380kB, file-rss:152808kB
I suspect the source of the problem is unrelated to the model size. With the default parameters using the command you posted above, I get the following:
batch size
GPU memory footprint
32
6949 GB
16
3915 GB
8
2415 GB
So your 8GB GPU has the capacity to handle a batch size of up to 32.
I have trained neon for librispeech data. But it's always killed because OOM. My machine has 24GB memory and GeForce GTX 1070 card of 8G memory.
I found this msg by dmsg
[3017506.733819] Out of memory: Kill process 25635 (python) score 974 or sacrifice child
[3017506.736861] Killed process 25635 (python) total-vm:55518724kB, anon-rss:23902876kB, file-rss:154436kB
is neon leaking memory or it require more memory to train?
The command I run is:
python train.py --manifest train:/bigdata/lili/deepspeech/librispeech/train-clean-100/train-manifest.csv --manifest val:/bigdata/lili/deepspeech/librispeech/train-clean-100/val-manifest.csv -e 20 -z 16 -s models -b gpu
The text was updated successfully, but these errors were encountered: