Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training on the GPU does not correctly work? #22

Open
LEGoebel opened this issue Oct 29, 2020 · 0 comments
Open

Training on the GPU does not correctly work? #22

LEGoebel opened this issue Oct 29, 2020 · 0 comments

Comments

@LEGoebel
Copy link

Hello,
I am trying to train the network myself on the GPU just to test if I can recreate everything. However I encountered a problem. I got two GPUs in my machine. The one with ID 0 and about 8 GB of VRAM and a good one (ID 1) for computing with about 64gb or VRAM. Now the problem is that if I go into the config file to adjust this to

device:
use_gpu: True
gpu_ids: '1'
num_workers: 2

I get a message, that the VRAM of the corresponding device is full and the training is aborted. Changing the ID to 0 works, but takes ages (like 3 days for the object detection, another 4-5 days for the mesh generation and the joint training is still running after 1.5 days at epoch 80/400).

Can someone tell me my mistake and what I can do to actually train on the correct GPU (as stated above, I already tried to set the ID to 1, but that doesnt work)?

Thank you very much in advance.

And just to clarify: The pretrained model works absolutely fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant