Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gossipd logs causing slow htlcs? #7815

Open
JssDWt opened this issue Nov 14, 2024 · 5 comments
Open

gossipd logs causing slow htlcs? #7815

JssDWt opened this issue Nov 14, 2024 · 5 comments

Comments

@JssDWt
Copy link
Contributor

JssDWt commented Nov 14, 2024

I noticed during the handling of a htlc, there's lines like gossipd: Log pruned 71408 entries (mem 10485792 -> 3393591) with a 10 second gap between them. It appears the logs produced by gossipd are blocking the execution of 'normal operation' like handling htlcs. In below example 20 seconds were added to the processing time of a single htlc, clearly surrounding the moments the log pruning is shown in the logs. Not the --> x second gap here lines I added in below log to make clear where it's adding time.

2024-11-14T14:47:15.590Z DEBUG   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-channeld-chan#1: peer_in WIRE_UPDATE_ADD_HTLC
2024-11-14T14:47:15.591Z DEBUG   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-channeld-chan#1: NEW:: HTLC REMOTE 126 = RCVD_ADD_HTLC/SENT_ADD_HTLC 
2024-11-14T14:47:15.985Z DEBUG   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-channeld-chan#1: peer_in WIRE_COMMITMENT_SIGNED
2024-11-14T14:47:15.986Z DEBUG   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-channeld-chan#1: handle_peer_commit_sig(splice: 0, remote_splice: 0)
2024-11-14T14:47:15.986Z DEBUG   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-channeld-chan#1: Received commit
2024-11-14T14:47:15.986Z DEBUG   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-channeld-chan#1: htlc 126: RCVD_ADD_HTLC->RCVD_ADD_COMMIT
2024-11-14T14:47:15.986Z DEBUG   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-channeld-chan#1: htlc added LOCAL: local 41232871 remote 13767129
2024-11-14T14:47:15.986Z DEBUG   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-channeld-chan#1: -> local 41232871 remote 13753079
2024-11-14T14:47:15.986Z DEBUG   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-channeld-chan#1: rcvd_commit: HTLC REMOTE 126 = RCVD_ADD_COMMIT/SENT_ADD_COMMIT 
2024-11-14T14:47:15.986Z DEBUG   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-channeld-chan#1: Derived key x from basepoint y, point z
2024-11-14T14:47:15.986Z DEBUG   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-channeld-chan#1: Received commit_sig with 0 htlc sigs
2024-11-14T14:47:15.986Z DEBUG   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-hsmd: Got WIRE_HSMD_VALIDATE_COMMITMENT_TX
2024-11-14T14:47:15.987Z DEBUG   hsmd: Client: Received message 35 from client
2024-11-14T14:47:15.994Z DEBUG   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-channeld-chan#1: Sending revoke_and_ack
2024-11-14T14:47:15.994Z DEBUG   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-channeld-chan#1: htlc 126: RCVD_ADD_COMMIT->SENT_ADD_REVOCATION
2024-11-14T14:47:15.995Z DEBUG   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-channeld-chan#1: sending_revoke_and_ack: HTLC REMOTE 126 = SENT_ADD_REVOCATION/RCVD_ADD_REVOCATION 
2024-11-14T14:47:15.995Z DEBUG   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-channeld-chan#1: revoke_and_ack made pending: commit timer
2024-11-14T14:47:15.995Z DEBUG   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-channeld-chan#1: Sending master 1021
2024-11-14T14:47:16.006Z DEBUG   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-chan#1509: got commitsig 320: feerate 1255, blockheight: 0, 1 added, 0 fulfilled, 0 failed, 0 changed. 0 splice commitments.
2024-11-14T14:47:16.007Z DEBUG   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-chan#1509: Adding their HTLC 126
2024-11-14T14:47:16.008Z DEBUG   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-chan#1509: HTLC in 126 RCVD_ADD_COMMIT->SENT_ADD_REVOCATION
2024-11-14T14:47:16.012Z DEBUG   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-channeld-chan#1: ... , awaiting 1121
2024-11-14T14:47:16.013Z DEBUG   hsmd: Client: Received message 1 from client
2024-11-14T14:47:16.032Z DEBUG   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-hsmd: Got WIRE_HSMD_REVOKE_COMMITMENT_TX
2024-11-14T14:47:16.032Z DEBUG   hsmd: Client: Received message 40 from client
2024-11-14T14:47:16.032Z DEBUG   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-channeld-chan#1: Got it!
2024-11-14T14:47:16.033Z DEBUG   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-channeld-chan#1: peer_out WIRE_REVOKE_AND_ACK
2024-11-14T14:47:16.033Z DEBUG   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-channeld-chan#1: Trying commit
2024-11-14T14:47:16.033Z DEBUG   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-channeld-chan#1: htlc 126: SENT_ADD_REVOCATION->SENT_ADD_ACK_COMMIT
2024-11-14T14:47:16.033Z DEBUG   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-channeld-chan#1: htlc added REMOTE: local 41232871 remote 13767129
2024-11-14T14:47:16.033Z DEBUG   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-channeld-chan#1: -> local 41232871 remote 13753079
2024-11-14T14:47:16.033Z DEBUG   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-channeld-chan#1: sending_commit: HTLC REMOTE 126 = SENT_ADD_ACK_COMMIT/RCVD_ADD_ACK_COMMIT 
2024-11-14T14:47:16.306Z DEBUG   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-hsmd: Got WIRE_HSMD_SIGN_REMOTE_COMMITMENT_TX
2024-11-14T14:47:16.306Z DEBUG   hsmd: Client: Received message 19 from client
2024-11-14T14:47:16.547Z DEBUG   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-channeld-chan#1: Creating commit_sig signature 320 aaa for tx bbbbb wscript ccccc key ddddd
2024-11-14T14:47:16.548Z DEBUG   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-channeld-chan#1: Telling master we're about to commit...
2024-11-14T14:47:16.548Z DEBUG   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-channeld-chan#1: Sending master 1020
2024-11-14T14:47:16.548Z DEBUG   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-chan#1509: HTLC in 126 SENT_ADD_REVOCATION->SENT_ADD_ACK_COMMIT
2024-11-14T14:47:16.554Z DEBUG   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-channeld-chan#1: ... , awaiting 1120
2024-11-14T14:47:18.459Z DEBUG   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-channeld-chan#1: Got it!
2024-11-14T14:47:18.460Z DEBUG   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-channeld-chan#1: Sending commit_sig with 0 htlc sigs
2024-11-14T14:47:18.462Z DEBUG   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-channeld-chan#1: peer_out WIRE_COMMITMENT_SIGNED
2024-11-14T14:47:18.463Z DEBUG   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-channeld-chan#1: Can't send commit: waiting for revoke_and_ack

--> 5 second gap here

2024-11-14T14:47:23.451Z DEBUG   gossipd: Log pruned 71408 entries (mem 10485792 -> 3393591)

--> 5 second gap here

2024-11-14T14:47:28.042Z DEBUG   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-hsmd: Got WIRE_HSMD_VALIDATE_REVOCATION
2024-11-14T14:47:28.045Z DEBUG   hsmd: Client: Received message 36 from client
2024-11-14T14:47:28.046Z DEBUG   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-hsmd: Got WIRE_HSMD_SIGN_PENALTY_TO_US
2024-11-14T14:47:28.046Z DEBUG   hsmd: Client: Received message 14 from client
2024-11-14T14:47:28.521Z DEBUG   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-channeld-chan#1: peer_in WIRE_REVOKE_AND_ACK
2024-11-14T14:47:28.521Z DEBUG   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-channeld-chan#1: Received revoke_and_ack
2024-11-14T14:47:28.521Z DEBUG   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-channeld-chan#1: htlc 126: SENT_ADD_ACK_COMMIT->RCVD_ADD_ACK_REVOCATION
2024-11-14T14:47:28.521Z DEBUG   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-channeld-chan#1: rcvd_revoke_and_ack: HTLC REMOTE 126 = RCVD_ADD_ACK_REVOCATION/SENT_ADD_ACK_REVOCATION 
2024-11-14T14:47:28.521Z DEBUG   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-channeld-chan#1: No commits outstanding after recv revoke_and_ack
2024-11-14T14:47:28.521Z DEBUG   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-channeld-chan#1: HTLC 126[REMOTE] => RCVD_ADD_ACK_REVOCATION
2024-11-14T14:47:28.522Z DEBUG   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-channeld-chan#1: Sending master 1022
2024-11-14T14:47:28.522Z DEBUG   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-chan#1509: got revoke 319: 1 changed
2024-11-14T14:47:28.522Z DEBUG   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-chan#1509: HTLC in 126 SENT_ADD_ACK_COMMIT->RCVD_ADD_ACK_REVOCATION
2024-11-14T14:47:28.568Z DEBUG   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-channeld-chan#1: ... , awaiting 1122

--> 3 second gap here

2024-11-14T14:47:31.705Z DEBUG   gossipd: Log pruned 71920 entries (mem 10485784 -> 3405421)

--> 6 second gap here

2024-11-14T14:47:37.719Z DEBUG   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-channeld-chan#1: Got it!
2024-11-14T14:47:37.720Z DEBUG   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-channeld-chan#1: revoke_and_ack LOCAL: remote_per_commit = a, old_remote_per_commit = b
2024-11-14T14:47:37.720Z DEBUG   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-channeld-chan#1: Trying commit
2024-11-14T14:47:37.720Z DEBUG   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-channeld-chan#1: Can't send commit: nothing to send, feechange not wanted ({ SENT_ADD_ACK_REVOCATION:1255 }) blockheight not wanted ({ SENT_ADD_ACK_REVOCATION:0 })
2024-11-14T14:47:37.739Z INFO    plugin-lspd_cln_plugin: 2024/11/14 14:47:37 Received HtlcResolution correlationid:\"cln:htlc_accepted#299833\" continue:{}
@cdecker
Copy link
Member

cdecker commented Nov 14, 2024

This is quite interesting indeed. Correct me if I'm wrong @rustyrussell but I think we do.prioritize channel related traffic already, or was that only in connectd for outgoing traffic?

Notice that this could also be the old lnd head-of-line blocking, where they queued all the gossip at once and where not prioritizing channel related traffic. Or you have some incredibly noisy peers and we should stop processing gossip from extremely chatty nodes.

@JssDWt
Copy link
Contributor Author

JssDWt commented Nov 14, 2024

Noting the peer in question is a cln greenlight node, and these logs start when the htlc comes in and end when the htlc is passed over to a plugin in the htlc_accepted hook. The gaps are unrelated to messages coming in from the peer. The revoke_and_ack received at 2024-11-14T14:47:28.521Z was sent by the peer milliseconds after receiving the commitment_signed 2024-11-14T14:47:18.462Z. So the 10 seconds delay are actually caused by this node itself.

@JssDWt
Copy link
Contributor Author

JssDWt commented Nov 14, 2024

Also noting this node did not have #7767 deployed. Could that fix the issue?

@cdecker
Copy link
Member

cdecker commented Nov 15, 2024

Definitely a possibility since we're talking about message queues filling and head-of-line-blocking. I wonder if we should add simple in-memory counters for the lightningd connections, so we can see when we run into buffer bloat, or if we are just very busy with small messages, and then we can cut down on those messages.

@JssDWt
Copy link
Contributor Author

JssDWt commented Nov 15, 2024

We've upgraded to include #7767 yesterday. Looking at the gossipd: Log pruned logs and the timestamps before and after it seems like this may have been the culprit. Where before the timestamps would be seconds apart, they are now milliseconds apart. I will keep monitoring this, as it could have been the node restart that relaxed things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants