-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify info client usage #68
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙌 This is a great improvement to make. The changes all LGTM, but do want to note that when trying to use the public API in the relayer configuration now with these changes, I hit the follow error when it is attempting to relay a message:
{"level":"error","timestamp":"2023-10-24T14:46:09.656-0400","logger":"awm-relayer","caller":"relayer/message_relayer.go:411","msg":"Failed to get the canonical subnet validator set","subnetID":"1M439meCk9SDQYJSrDBJJeRntk1FtmSaaZakSeLFvcJonxGUM","error":"failed to fetch validator set (P-Chain Height: 123702, SubnetID: 1M439meCk9SDQYJSrDBJJeRntk1FtmSaaZakSeLFvcJonxGUM): failed to decode client response: the method platform.getValidatorsAt is not available"}
I think these changes are self-standing and still worth moving forward with as is though.
) | ||
return nil, nil, err | ||
} | ||
if len(beaconIPs) < numInitialTestPeers { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to limit the number of peers to 5 again? IIUC when we do the signature aggregation process it could take longer in terms of us sending out more signature requests, but would increase likelihood of having enough stake threshold in the peers. Are 5 peers generally enough for exceeding stake threshold?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just the list of initial peers to manually connect to in order to initialize the app request network. When requesting signatures, we connect to all peers from which we are requesting signatures.
separate of PR changes, but was looking at the relayer's network manual track. Supposedly the network will keep attempting the connection with exponential backoffs, should we look at timing out this operation? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, left some questions for clarification
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good to me. Left a few questions.
"Failed to find a full set of peers to connect to on startup", | ||
zap.Int("connectedPeers", len(beaconIPs)), | ||
zap.Int("expectedConnectedPeers", numInitialTestPeers), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we abort if the number of connectedPeers is 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, we return an error a few lines above here.
"Failed to find a full set of peers to connect to on startup", | ||
zap.Int("connectedPeers", len(beaconIPs)), | ||
zap.Int("expectedConnectedPeers", numInitialTestPeers), | ||
) | ||
} | ||
|
||
for i, beaconIDStr := range beaconIDs { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we are unable to connect to any of the initial test peers? Will we select another sample of peers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question. As Matt mentioned above, ManuallyTrack
will continuously attempt to connect. We should look into what the failure modes for that function call are, and consider how to handle them. Right now if we are unable to connect to any peers on initialization, we will attempt to connect to the subnet validators when relaying a message. So there is some form of redundancy, but an explicit resampling on startup would be more robust.
As discussed here, I think we should first understand the failure modes of |
Why this should be merged
Currently the P-Chain API URL passed in the config is used for two purposes:
The latter case requires that the info API expose the
GetNodeIP
method on the node. For the public RPC (api.avax.network), this method is disabled.How this works
Removes the calls to the methods
GetNodeIP
andGetNodeID
on the configured P-Chain API node. Instead, we get the list of peer IPs and IDs fromGetPeers
, which is enabled on the public RPC server.Also fixes an edge case in which it's possible to connect to fewer than the intended number of peers on startup.
How this was tested
CI
How is this documented
N/A