-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
go/worker/compute/executor/committee: Support backup proposers #5354
Conversation
4231f4b
to
f65c980
Compare
Codecov Report
@@ Coverage Diff @@
## master #5354 +/- ##
==========================================
+ Coverage 66.23% 66.57% +0.33%
==========================================
Files 528 534 +6
Lines 56237 56288 +51
==========================================
+ Hits 37251 37471 +220
+ Misses 14526 14380 -146
+ Partials 4460 4437 -23
|
35438a8
to
f9112a0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm mostly reading through the design at this point.
"priority" is a little tricky to understand just by the name of it. the lowest priority number is least preferred schedule when a node considers multiple schedules, and the highest priority number is the most preferred. meanwhile, the lowest priority number is meant to be the one that gets produced first. thus the node with the highest priority number is also the one that's least urgently doing the scheduling.
the system of waiting to publish a schedule based on the priority number is noted not to be enforced. it feels like following this schedule is not rational though. I'm guessing rational would be either always schedule right away to have control over tx ordering and/or get rewards; or never schedule to save compute+bandwidth.
what's the amount of time per 1 priority? I'm seeing a 2 second constant in the code? can a committee routinely finish in that amount of time? the design says the pool resets when it sees a higher priority commitment. we wouldn't want this repeatedly to have a few fast nodes commit, then slower nodes not finish before the next time interval and a new schedule comes out.
do nodes get these commitments through mechanisms other than reading them from the roothash app in the consensus layer? they communicate over p2p right? that feels like it would lead to the possibility of nodes coming to different conclusions as messages propagated throughout the committee. e.g. if the last needed commit for priority i and the proposed schedule for priority i+1 are being propagated at the same time, some nodes can see priority i finalize while some nodes can see it reset and start on priority i+1
I can rename to token one something similar. Will think about that.
That is not true.
Yes, currently hard-coded (could changed to a parameter).
The committee doesn't need to finish in that time. The proposer which turn it is should finish in that time and publish a proposal via P2P. And once others receive the proposal, they have time until the consensus timeout fires to finish and accept the proposal.
Fast node commits don't bother other committee nodes as they will start executing proposals only when the time is right, So we can have 100 fast commits in the first second, but the committee nodes will do nothing if the primary proposal hasn't committed yet.
Commitments are over consensus and P2P layer (just for discrepancy detection), proposals are over P2P.
Yes, nodes could have different conclusions, depending on which proposals they have received. But they always build on proposals with higher priority until the round is finalized or discrepancy detected. Those events come from the consensus, so it is absolute truth and every node should see that. |
could you describe the protocol in more detail? it sounds like I'm missing some functionality to discourage dishonesty in the priority based waiting scheme.
can the primary transaction scheduler supersede it? it sounded like "primary" refers to the node with priority 1, is that the case? a node that doesn't follow the prescribed timing scheme could win by sending its schedule right away, as any node other than the primary has higher priority. and if other nodes are honest, it'll be a while before they get around to scheduling, so our dishonest node's proposal would reasonably get finalized
what does it have to do to avoid this penalty? does it have to get its schedule out in the right order? I'm guessing no, as there's no proof of who received what in what order. maybe that's fine, as stealing a higher priority node's work is complicated by the signature being done in a TEE
ah and honest nodes are supposed to cancel their scheduling when they see a proposal come out?
could a proposal cause a node to reset a lower priority pool, then consensus tell the node that actually that lower priority schedule was finalized? |
Original Protocol: Committee:
Commitment Collection:
Block Finalization:
Consensus Round Timeout:
Commitment Collection Timeout:
Pool:
New Protocol: Terminology:
Committee:
Commitment Collection:
Pool:
Nodes:
Primary refers to the node with priority 0, where 0 is the best priority.
Commitment belonging to a proposal from a worker with better priority can supersede worse commitments, but only in the discrepancy detection mode. So the primary proposer can publish a commitment that will supersede anyone.
Anyone can ignore the timing scheme and share its proposal via P2P network. But the problem arises at honest workers, as they will figure out that they have received the proposal too soon and will wait until they publish a commitment for that proposal to the consensus. Meanwhile, they will probably receive the proposal from the primary proposer, submit a commitment for that one and supersede the dishonest commitment (if there is one in the pool and waiting for the majority).
Honest workers don't propose if they see a proposal with better priority.
It could happen that a node would receive a proposal with better priority, create a commitment and publish it to the consensus only to be rejected because a block was already finalized from a proposal with worse priority. That is ok, as that would mean that the node with better priority needed too long to publish a proposal, so the backup proposers stepped in. |
ah that clears things up a lot, thanks |
I'm fixing backup proposers, and I've come across the following questions:
Any thoughts? |
Which runtime are you looking at? It should both be 2 for Sapphire.
This roughly seems to make sense, the downside being that one could DoS the straggler so it is slow to prove the discrepancy. But having a timeout just gives the straggler slightly more time. The other implication is that liveness evaluation could cause such stragglers to be penalized more.
The IO root includes both inputs (transactions) and outputs (results and events), so this should already be the case?
What are the tradeoffs of these different options? |
Was looking at the genesis file. Not sure which runtime that was.
|
Let's go with queuing commitments until the scheduler submits its own commitment. The round cannot be finalized until we have the proposer commitment anyway. |
4d18a37
to
67ab7d6
Compare
Still have to fix some byzantine tests, but the latest changes are:
|
@@ -0,0 +1,7 @@ | |||
go/worker/compute/executor/committee: Support backup proposers | |||
|
|||
Starting now, all executor committee workers are permitted to schedule |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to have a short ADR with a formal description and pros & cons. But since it's already implemented, no need to add it now retrospectively IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-formal description of pros/cons (comparing to the previous solution and settings):
Settings:
allowed_stragglers
:1
round_timeout
:2 blocks
propose_batch_timeout
:2 blocks
(before),2 seconds
(now)
Pros:
- If the primary scheduler is offline, the round will finalize with 2 seconds delay. The penalty will be 0 or 1 consensus block (0 or 6 seconds). Currently, the penalty is 3 consensus blocks (18 seconds), i.e. 2 blocks for proposer timeout and 1 block for proposer timeout transaction.
- If the primary scheduler and the first/one backup scheduler are offline, the penalty will be 2 or 3 consensus blocks (18 seconds), as the second backup scheduler will fire after 4 seconds and discrepancy resolution will start after 2 consensus blocks.
- No need for proposer timeout transaction.
Cons:
- A scheduler with higher rank could intentionally propose after schedulers with lower rank to force them to do extra work and to delay rounds. Not sure, if there are any gains here, as the rounds will be delayed and less fees collected.
- If this becomes a problem we can fight back with statistics and slashing, by lowering the timeout for overriding the current scheduler, with limits how many overrides can be done, ...
- If worker's clock is not synchronized, he could do extra work by submitting commitments for proposals with lower rank which turn hasn't come yet.
- Workers need to exchange and store more proposals via P2P network. However, some can be dropped when a scheduler with higher rank commits.
- Pool needs to store more commitments/votes. However, some can be dropped when a scheduler with higher rank commits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cons:
- A worker can compute multiple commitments for the same round, if there are multiple overrides. If schedulers collide, the inputs and transactions can be the same, forcing the worker to run its enclave multiple times on the same data (sounds like a replay attack with possibility of side channel attacks). But I guess this is possible even now, as failed rounds have the same roots. Don't see any danger here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
forcing the worker to run its enclave multiple times on the same data (sounds like a replay attack with possibility of side channel attacks)
Yeah use cases that care about this should perform a two-step process, e.g. in the first step only commit something to storage and then only after it is finalized, perform the second step based on the data in storage.
Also to limit this, we could only allow one run per round/scheduler (until restart).
return nil, ErrDiscrepancyDetected | ||
if len(votes) > 1 || failures > int(allowedStragglers) { | ||
// Backup schedulers need to wait for a round timeout. | ||
return nil, ErrStillWaiting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because the "Early discrepancy detection"
would already catch the discrepancy if p.HighestRank=0
right, so we know that this means that the primary scheduler has not yet committed?
Maybe include this in a comment (all the facts already assumed here due to the early discrepancy checks and not rechecked), as this was not immediately clear to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because the
"Early discrepancy detection"
would already catch the discrepancy ifp.HighestRank=0
right.
Yes. If the highest rank is 0, this means that the primary scheduler already committed and the early discrepancy detection handled this case. Therefore, this if
cannot happen.
so we know that this means that the primary scheduler has not yet committed
Yes. If this if
is true, we know that the primary scheduler hasn't committed but some other backup scheduler has. And backup schedulers need to wait for round timeout, as mentioned above.
Will update the comment.
a7eee89
to
88ff0dc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reviewed, focusing on changes to the central part in the pool system. looks to be implemented as described
Starting now, all executor committee workers are permitted to schedule transactions, each with distinct per-round rank. The rank dictates the time after which a worker can propose a new batch. The consensus layer tracks all published executor commitments and tries to build a new runtime block on a proposal from the highest-ranked scheduler..
This is needed because in tests the clients also need to perform the chain context transition and without local storage this never succeeds.
48623ef
to
8352d10
Compare
8352d10
to
2a53cf8
Compare
All committee members can now propose batches, but each has its own priority which dictates the time after which a worker can propose a new batch. The consensus and every (fair) node take this priority into account when processing a proposal or scheduling a batch.
Consensus:
resolution or discrepancy resolution is initiated.
Workers:
Changes:
violation from the transaction scheduler.
TODO: