Training on the GPU does not correctly work? #22

LEGoebel · 2020-10-29T14:37:57Z

Hello,
I am trying to train the network myself on the GPU just to test if I can recreate everything. However I encountered a problem. I got two GPUs in my machine. The one with ID 0 and about 8 GB of VRAM and a good one (ID 1) for computing with about 64gb or VRAM. Now the problem is that if I go into the config file to adjust this to

device:
use_gpu: True
gpu_ids: '1'
num_workers: 2

I get a message, that the VRAM of the corresponding device is full and the training is aborted. Changing the ID to 0 works, but takes ages (like 3 days for the object detection, another 4-5 days for the mesh generation and the joint training is still running after 1.5 days at epoch 80/400).

Can someone tell me my mistake and what I can do to actually train on the correct GPU (as stated above, I already tried to set the ID to 1, but that doesnt work)?

Thank you very much in advance.

And just to clarify: The pretrained model works absolutely fine.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training on the GPU does not correctly work? #22

Training on the GPU does not correctly work? #22

LEGoebel commented Oct 29, 2020

Training on the GPU does not correctly work? #22

Training on the GPU does not correctly work? #22

Comments

LEGoebel commented Oct 29, 2020