Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify info client usage #68

Merged
merged 2 commits into from
Oct 25, 2023
Merged

Simplify info client usage #68

merged 2 commits into from
Oct 25, 2023

Conversation

cam-schultz
Copy link
Collaborator

@cam-schultz cam-schultz commented Oct 24, 2023

Why this should be merged

Currently the P-Chain API URL passed in the config is used for two purposes:

  • To get the canonical validator set for a subnet
  • To establish peers in the app request network via the info API
    The latter case requires that the info API expose the GetNodeIP method on the node. For the public RPC (api.avax.network), this method is disabled.

How this works

Removes the calls to the methods GetNodeIP and GetNodeID on the configured P-Chain API node. Instead, we get the list of peer IPs and IDs from GetPeers, which is enabled on the public RPC server.

Also fixes an edge case in which it's possible to connect to fewer than the intended number of peers on startup.

How this was tested

CI

How is this documented

N/A

Copy link
Collaborator

@michaelkaplan13 michaelkaplan13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙌 This is a great improvement to make. The changes all LGTM, but do want to note that when trying to use the public API in the relayer configuration now with these changes, I hit the follow error when it is attempting to relay a message:

{"level":"error","timestamp":"2023-10-24T14:46:09.656-0400","logger":"awm-relayer","caller":"relayer/message_relayer.go:411","msg":"Failed to get the canonical subnet validator set","subnetID":"1M439meCk9SDQYJSrDBJJeRntk1FtmSaaZakSeLFvcJonxGUM","error":"failed to fetch validator set (P-Chain Height: 123702, SubnetID: 1M439meCk9SDQYJSrDBJJeRntk1FtmSaaZakSeLFvcJonxGUM): failed to decode client response: the method platform.getValidatorsAt is not available"}

I think these changes are self-standing and still worth moving forward with as is though.

)
return nil, nil, err
}
if len(beaconIPs) < numInitialTestPeers {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to limit the number of peers to 5 again? IIUC when we do the signature aggregation process it could take longer in terms of us sending out more signature requests, but would increase likelihood of having enough stake threshold in the peers. Are 5 peers generally enough for exceeding stake threshold?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just the list of initial peers to manually connect to in order to initialize the app request network. When requesting signatures, we connect to all peers from which we are requesting signatures.

@minghinmatthewlam
Copy link

separate of PR changes, but was looking at the relayer's network manual track. Supposedly the network will keep attempting the connection with exponential backoffs, should we look at timing out this operation?

Copy link

@minghinmatthewlam minghinmatthewlam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, left some questions for clarification

Copy link
Contributor

@bernard-avalabs bernard-avalabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good to me. Left a few questions.

"Failed to find a full set of peers to connect to on startup",
zap.Int("connectedPeers", len(beaconIPs)),
zap.Int("expectedConnectedPeers", numInitialTestPeers),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we abort if the number of connectedPeers is 0?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, we return an error a few lines above here.

"Failed to find a full set of peers to connect to on startup",
zap.Int("connectedPeers", len(beaconIPs)),
zap.Int("expectedConnectedPeers", numInitialTestPeers),
)
}

for i, beaconIDStr := range beaconIDs {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we are unable to connect to any of the initial test peers? Will we select another sample of peers?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question. As Matt mentioned above, ManuallyTrack will continuously attempt to connect. We should look into what the failure modes for that function call are, and consider how to handle them. Right now if we are unable to connect to any peers on initialization, we will attempt to connect to the subnet validators when relaying a message. So there is some form of redundancy, but an explicit resampling on startup would be more robust.

@cam-schultz
Copy link
Collaborator Author

separate of PR changes, but was looking at the relayer's network manual track. Supposedly the network will keep attempting the connection with exponential backoffs, should we look at timing out this operation?

As discussed here, I think we should first understand the failure modes of ManuallyTrack, then consider how to handle them.

@cam-schultz cam-schultz merged commit e178b46 into main Oct 25, 2023
7 checks passed
@cam-schultz cam-schultz deleted the simplify-info-client-usage branch October 25, 2023 18:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants