You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
INFO - 2022-09-06 10:57:42,406 - data_loaders - Resetting the data loader seed to 0
INFO - 2022-09-06 10:57:54,158 - trainer - Validation iter 101 / 400 : Data Loading Time: 0.054, Feature Extraction Time: 0.023, Matching Time: 0.035, Loss: 0.578, RTE: 1.279, R
RE: 0.498, Hit Ratio: 0.071, Feat Match Ratio: 0.495
INFO - 2022-09-06 10:58:05,445 - trainer - Validation iter 201 / 400 : Data Loading Time: 0.052, Feature Extraction Time: 0.023, Matching Time: 0.034, Loss: 0.576, RTE: 1.186, R
RE: 0.488, Hit Ratio: 0.067, Feat Match Ratio: 0.478
INFO - 2022-09-06 10:58:17,300 - trainer - Validation iter 301 / 400 : Data Loading Time: 0.056, Feature Extraction Time: 0.022, Matching Time: 0.035, Loss: 0.556, RTE: 1.133, R
RE: 0.471, Hit Ratio: 0.073, Feat Match Ratio: 0.502
INFO - 2022-09-06 10:58:28,882 - trainer - Final Loss: 0.554, RTE: 1.140, RRE: 0.458, Hit Ratio: 0.072, Feat Match Ratio: 0.490
Traceback (most recent call last): [0/903]
File "train.py", line 81, in <module>
main(config)
File "train.py", line 57, in main
trainer.train()
File "/home/bit/CODE/Research/Point_Cloud_Reg/FCGF/lib/trainer.py", line 130, in train
self._train_epoch(epoch)
File "/home/bit/CODE/Research/Point_Cloud_Reg/FCGF/lib/trainer.py", line 495, in _train_epoch
loss.backward()
File "/home/bit/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/tensor.py", line 245, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/home/bit/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/autograd/__init__.py", line 145, in backward
Variable._execution_engine.run_backward(
RuntimeError: merge_sort: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered
With torch.autograd.detect_anomaly():
[W python_anomaly_mode.cpp:104] Warning: Error detected in IndexBackward. Traceback of forward call that caused the error:
File "/home/bit/CODE/Research/Point_Cloud_Reg/FCGF/train.py", line 78, in <module>
main(config)
File "/home/bit/CODE/Research/Point_Cloud_Reg/FCGF/train.py", line 55, in main
trainer.train()
File "/home/bit/CODE/Research/Point_Cloud_Reg/FCGF/lib/trainer.py", line 130, in train
self._train_epoch(epoch)
File "/home/bit/CODE/Research/Point_Cloud_Reg/FCGF/lib/trainer.py", line 485, in _train_epoch
pos_loss, neg_loss = self.contrastive_hardest_negative_loss(
File "/home/bit/CODE/Research/Point_Cloud_Reg/FCGF/lib/trainer.py", line 447, in contrastive_hardest_negative_loss
neg_loss1 = F.relu(self.neg_thresh - D10min[mask1]).pow(2)
I have changed the branch version to v0.5 and that error really confuses me.
The text was updated successfully, but these errors were encountered:
gitouni
changed the title
merge_sort error wheen training
erge_sort: failed to synchronize during training
Sep 6, 2022
gitouni
changed the title
erge_sort: failed to synchronize during training
merge_sort: failed to synchronize during training
Sep 6, 2022
Thank you for sharing your advanced work.
I met CUDA merge_sort error when training on 3DMatch.
Environment:
command
output
With torch.autograd.detect_anomaly():
I have changed the branch version to v0.5 and that error really confuses me.
The text was updated successfully, but these errors were encountered: