Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[consensus] Exclude low scoring ancestors in proposals with time based inclusion/exclusion #19605

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

arun-koshy
Copy link
Contributor

@arun-koshy arun-koshy commented Sep 28, 2024

Release notes

Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.

For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.

  • Protocol:
  • Nodes (Validators and Full nodes):
  • Indexer:
  • JSON-RPC:
  • GraphQL:
  • CLI:
  • Rust SDK:
  • REST API:

Copy link

vercel bot commented Sep 28, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
sui-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Nov 5, 2024 7:16am
3 Skipped Deployments
Name Status Preview Comments Updated (UTC)
multisig-toolkit ⬜️ Ignored (Inspect) Visit Preview Nov 5, 2024 7:16am
sui-kiosk ⬜️ Ignored (Inspect) Visit Preview Nov 5, 2024 7:16am
sui-typescript-docs ⬜️ Ignored (Inspect) Visit Preview Nov 5, 2024 7:16am

@arun-koshy arun-koshy marked this pull request as ready for review October 3, 2024 16:53
arun-koshy added a commit that referenced this pull request Oct 4, 2024
## Description 

To be used in smart ancestor selection
[PR#19605](#19605)

---

## Release notes

Check each box that your changes affect. If none of the boxes relate to
your changes, release notes aren't required.

For each box you select, include information after the relevant heading
that describes the impact of your changes that a user might notice and
any actions they must take to implement updates.

- [ ] Protocol: 
- [ ] Nodes (Validators and Full nodes): 
- [ ] Indexer: 
- [ ] JSON-RPC: 
- [ ] GraphQL: 
- [ ] CLI: 
- [ ] Rust SDK:
- [ ] REST API:
consensus/core/src/leader_scoring.rs Outdated Show resolved Hide resolved
consensus/core/src/core.rs Outdated Show resolved Hide resolved
consensus/core/src/core.rs Show resolved Hide resolved
consensus/core/src/core.rs Outdated Show resolved Hide resolved
consensus/core/src/round_prober.rs Outdated Show resolved Hide resolved
consensus/core/src/ancestor.rs Outdated Show resolved Hide resolved
consensus/core/src/ancestor.rs Outdated Show resolved Hide resolved
consensus/core/src/ancestor.rs Outdated Show resolved Hide resolved
match ancestor_info.state {
// Check conditions to switch to EXCLUDE state
AncestorState::Include => {
if propagation_score <= low_score_threshold {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered / tried exclusion based on the gap between a peer's low and high quorum rounds? Maybe we can leave a TODO here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did consider this but thought excluding based on the quality (votes) of blocks distributed was also important but I can give this a try. Do you have a sense for a good threshold between low and high quorum rounds. I believe it should not be more than 1 round in the healthy cases but wanted to confirm. Maybe something like 5-10 rounds is a good threshold for exclusion?

consensus/core/src/core.rs Show resolved Hide resolved
consensus/core/src/core.rs Show resolved Hide resolved
consensus/core/src/dag_state.rs Outdated Show resolved Hide resolved
consensus/core/src/dag_state.rs Outdated Show resolved Hide resolved
consensus/core/src/dag_state.rs Outdated Show resolved Hide resolved
).unwrap(),
included_excluded_proposal_ancestors_count_by_authority: register_int_counter_vec_with_registry!(
"included_excluded_proposal_ancestors_count_by_authority",
"Total number of included excluded ancestors per authority during proposal. Either weak or strong type",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"included excluded" , this could be a bit confusing (at least to me). Maybe something like Total number of ancestors per authority with 'excluded' status that got included in proposal. Either weak or strong type..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah its a little confusing, changed the description but left the metric name as is. If you have a batter name idea for the metric let me know.

consensus/core/src/core.rs Show resolved Hide resolved
Copy link
Member

@mwtian mwtian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay in reviewing.

consensus/core/src/dag_state.rs Outdated Show resolved Hide resolved
@@ -919,8 +968,8 @@ impl DagState {
self.scoring_subdag.is_empty()
}

pub(crate) fn calculate_scoring_subdag_scores(&self) -> ReputationScores {
self.scoring_subdag.calculate_scores()
pub(crate) fn calculate_scoring_subdag_distributed_vote_scores(&self) -> ReputationScores {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have done the protocol upgrade, is it still worth renaming?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed it back to calculate_scoring_subdag_scores but will go in and cleanup more of the leader scoring code now that we have done the protocol upgrade in a separate PR

consensus/core/src/core.rs Outdated Show resolved Hide resolved
consensus/core/src/core.rs Outdated Show resolved Hide resolved
consensus/core/src/core.rs Outdated Show resolved Hide resolved
self.context.metrics.node_metrics.smart_selection_wait.inc();
debug!("Only found {} stake of good ancestors to include for round {clock_round}, will wait for more.", parent_round_quorum.stake());
return vec![];
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it ok to continue when not smart_select?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, when we are not in smart_select mode which means we are in the force=true try_propose mode we will just fall back to essentially the old behavior where we will include enough "EXCLUDE" ancestors until we have reached a quorum for the parent round.

consensus/core/src/core.rs Outdated Show resolved Hide resolved
consensus/core/src/core.rs Outdated Show resolved Hide resolved
consensus/core/src/core.rs Outdated Show resolved Hide resolved
Add time based inclusion logic
// Inclusion threshold is based on network quorum round
const INCLUSION_THRESHOLD_PERCENTAGE: u64 = 90;

pub(crate) fn new(context: Arc<Context>, propagation_scores: ReputationScores) -> Self {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to pass in propagation_scores, instead of calling set_propagation_scores()?

// If round prober has not run yet and we don't have network quorum round,
// it is okay because network_low_quorum_round will be zero and we will
// include all ancestors until we get more information.
let network_low_quorum_round = self.calculate_network_low_quorum_round();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed, it may be safer to use a high quorum round instead for network round.

@@ -370,6 +384,43 @@ impl Core {
}
}

// TODO: produce the block for the clock_round. As the threshold clock can advance many rounds at once (ex
// because we synchronized a bulk of blocks) we can decide here whether we want to produce blocks per round
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are reasonably sure we only want to build the latest blocks. Earlier blocks do not contribute to voting, and they can be close to GC or voting limit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants