Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error while i run training #1

Open
hnnam0906 opened this issue Oct 24, 2024 · 3 comments
Open

Error while i run training #1

hnnam0906 opened this issue Oct 24, 2024 · 3 comments

Comments

@hnnam0906
Copy link

hnnam0906 commented Oct 24, 2024

Hi @Leaveson, @hyeonahkimm,

I get the following issue while running the training command in the readme
"python run.py --graph_size 50 --problem mtsp --run_name 'mtsp50' --agent_min 2 --agent_max 10"

assert not torch.isnan(log_p).any()
AssertionError
in _get_log_p method

image

As far as i know all the log_p are NaNs so it throws the error. Please kindly help to investigate and correct the issue.
Thanks

@hnnam0906
Copy link
Author

Hi @Leaveson , @hyeonahkimm,

I hope you are doing well.
Your solution is really nice and i would like to use it. Can you kindly investigate and correct the above issue so that i can train it by myself?

Thanks

@hnnam0906
Copy link
Author

Hi @Leaveson, @hyeonahkimm,

Could you help to solve the above issue?
I look forward to hearing from you.

Thanks.

@hyeonahkimm
Copy link
Contributor

Hi @hnnam0906 ,

When I trained the model with the same script, the error didn't occur until after finishing one epoch. However, I suspect the error might be due to numerical instability in the log softmax operation. When log p values are too large, the operation can give nan values. Please try adding clipping before the log softmax function. Let me know if you still experience the same problem.

Best,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants