Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

display actionable error message when sync does not start #4042

Open
egasimus opened this issue Nov 18, 2024 · 1 comment
Open

display actionable error message when sync does not start #4042

egasimus opened this issue Nov 18, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@egasimus
Copy link
Contributor

egasimus commented Nov 18, 2024

Recently, we began to frequently encounter cases where our local "resync" (pseudo-archival) nodes successfully credit initial balances, and then do not begin to sync from the provided persistent_peers.

INFO namada_node::shell::init_chain: Crediting X nam tokens to Y
INFO namada_node::shell::init_chain: Crediting A nam tokens to Z
# ... repeat for thousands of lines ...
# and then crickets 🦗🦗🦗
  • Sometimes this is due to misconfiguring the persistent peers, e.g. wrong hostname, wrong port, wrong docker networking config...
  • Sometimes might possibly be due to the peers only allowing 1 connection per source IP (bad if you want to do e.g. blue/green deploy from same IP).
  • Sometimes, confusingly, you wait for 2 minutes and it does begin to sync.
  • Sometimes, even more confusingly, everything is fine, except that the node has stopped retrying, and the sync only begins after a manual restart of the node.

These have different root causes, yet in all cases there is zero feedback from namadan as to what is wrong. This makes it difficult to determine and take the appropriate next step in a timely manner, which puts unreasonable strain on our DevOps resources.

It would be immensely helpful if the state of "failure to begin sync" resulted in an explanatory error message being emitted at INFO, WARNING, or ERROR level.

Looking at the way run_aux launches multiple sub-tasks on an asynchronous basis, I'd venture a guess that it will also be necessary to repeat that message periodically, so that it doesn't get lost in the scrollback from the crediting tokens messages.

@egasimus egasimus added the enhancement New feature or request label Nov 18, 2024
@sug0
Copy link
Collaborator

sug0 commented Nov 18, 2024

All network code is handled by CometBFT. Namada hides its output by default, but you can export NAMADA_CMT_STDOUT=true and CMT_LOG_LEVEL=info or CMT_LOG_LEVEL=debug to see what's going on at the P2P level. Be warned that setting CometBFT's log level to debug generates incredibly noisy output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants