Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

context: add simple rtt estimator #140

Merged
merged 1 commit into from
Jan 10, 2025
Merged

context: add simple rtt estimator #140

merged 1 commit into from
Jan 10, 2025

Conversation

roman-khimov
Copy link
Member

After b3c1c3b the only source of block time inaccuracy left is PrepareRequest propagation. In general it's undetermined and can vary from node to node and from round to round. But in practice it's relatively stable and can be averaged out over some number of samples. While backup node can't know when request was sent to it, it can measure what happens to its request when it's primary since healthy nodes are expected to respond immediately making request/response pretty similar to ping/pong roundtrip. This data also can't be perfect, messages can travel different ways from A to B and B to A on a P2P network, but in practice it's good enough and it's much better than having no data at all.

Notice that this timer adjustment scheme uses local timer only, we trust our local clock inevitably and we don't need to mess with any view of time that other nodes have (including the one from the block header).

It can be attacked by malicious peer or node delaying messages, but as we expect 2F+1 nodes to be correct one this is enough to collect proper number of responses and move on, seriously delayed response will be outdated by the time it comes then. And if any intermediary can delay messages to all nodes for arbitrary time on our P2P network we can't have any reliable timing anyway.

To me it fixes as much of neo-project/neo#2918 as only possible.

@roman-khimov roman-khimov added this to the v0.3.2 milestone Jan 10, 2025
Copy link

codecov bot commented Jan 10, 2025

Codecov Report

Attention: Patch coverage is 82.35294% with 3 lines in your changes missing coverage. Please review.

Project coverage is 58.49%. Comparing base (27db04c) to head (47f0422).
Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
rtt.go 70.00% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #140      +/-   ##
==========================================
+ Coverage   58.31%   58.49%   +0.17%     
==========================================
  Files          32       33       +1     
  Lines        2262     2279      +17     
==========================================
+ Hits         1319     1333      +14     
- Misses        859      861       +2     
- Partials       84       85       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

After b3c1c3b the only source of block time
inaccuracy left is PrepareRequest propagation. In general it's undetermined and
can vary from node to node and from round to round. But in practice it's
relatively stable and can be averaged out over some number of samples. While
backup node can't know when request was sent to it, it can measure what happens
to its request when it's primary since healthy nodes are expected to respond
immediately making request/response pretty similar to ping/pong roundtrip. This
data also can't be perfect, messages can travel different ways from A to B and
B to A on a P2P network, but in practice it's good enough and it's much better
than having no data at all.

Notice that this timer adjustment scheme uses local timer only, we trust our
local clock inevitably and we don't need to mess with any view of time that
other nodes have (including the one from the block header).

It can be attacked by malicious peer or node delaying messages, but as we
expect 2F+1 nodes to be correct one this is enough to collect proper number of
responses and move on, seriously delayed response will be outdated by the time
it comes then. And if any intermediary can delay messages to all nodes for
arbitrary time on our P2P network we can't have any reliable timing anyway.

To me it fixes as much of neo-project/neo#2918 as
only possible.

Signed-off-by: Roman Khimov <[email protected]>
@roman-khimov
Copy link
Member Author

roman-khimov commented Jan 10, 2025

Synthetic tests relative to #56 (which is a part of the master branch now). Single node is not affected by this change, so it's omitted.

4 nodes, 5s time, no delays, nothing to talk about:
ms_per_block_4_nodes_5s_no_delay
tps_4_nodes_5s_no_delay

The most interesting case, 5s, 200ms delay with 1000 RPS load (much lower than max). Adjustments are clearly seen there and they don't come to the full effect yet (70 blocks are needed), but TPS is the same (as expected):
ms_per_block_4_nodes_5s_delay_200ms_RPS_1000
tps_4_nodes_5s_delay_200ms_RPS_1000

Now pushing this network to its limits, 5s, 200ms delay and no RPS boundaries. Suddenly we're regaining some of TPS metrics lost in #56 (with better block time):
ms_per_block_4_nodes_5s_delay_200ms
tps_4_nodes_5s_delay_200ms

Let's make it faster with 1s block time, without delays, it's slightly better:
ms_per_block_4_nodes_1s_no_delay
tps_4_nodes_1s_no_delay

Now 1s and 200ms delay:
ms_per_block_4_nodes_1s_delay_200ms
tps_4_nodes_1s_delay_200ms

Overall it's just better or about the same in every synthetic test.

@roman-khimov
Copy link
Member Author

roman-khimov commented Jan 10, 2025

And testnet. Real network scattered all over the globe, 6 countries for 7 CNs, 1s target time with 200-300ms rtt. This also includes data for #56 isolated (but only two nodes out of seven!), since original here is pre-lastBlock change. Almost no transactions, 1000 blocks sample:
testnet_block_time

Zoom in for 100 blocks:
testnet_block_time

We're deviating by ~5% average with 20-30%% block time rtt, good enough for me.

@AnnaShaleva
Copy link
Member

We need to publish an article.

@AnnaShaleva AnnaShaleva merged commit d718d92 into master Jan 10, 2025
12 checks passed
@AnnaShaleva AnnaShaleva deleted the rtt branch January 10, 2025 16:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants