-
Notifications
You must be signed in to change notification settings - Fork 992
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
new mined block should be broadcasted if a node was synced #2324
Conversation
Is there really a difference between the first sync state and subsequent sync states? I played with doing something similar a while back but came to the conclusion that this just complicates the state handling. Scenario 1 -
Scenario 2 -
These scenarios should both be handled the same way I think, particularly the logic in |
@antiochp this PR is for #1139 (comment) [Updated]
Just realize that you want to cover the case of known restart for example a manual command to restart? prefer to ignore this case to make the thing simple. otherwise, we need record the last time and status into the database. |
I suspect this may be a very common case, at least initially.
In the adapter code for accepting a new block we already have -
My gut feeling here is adding another state risks adding significant complexity here in ways we do not yet have a good handle on.
Higher level question - Why does a mining node care if it is syncing or not? Why are we not just building blocks against whatever chain state we currently know about? Is the question then about if/when to broadcast these potentially stale blocks? A miner can see a peer advertise more work and transition to sync mode in two different scenarios -
In either case I think we should handle it the same way.
I'm arguing a node cannot be both |
I suppose there's no problem for this, people just wait the initial sync when restart. Same as now.
Come back to this PR, if we can simply avoid the case to broadcast stale blocks for an initial running node which is not synced yet, why not? as you see, just one line of code in adapter.rs.
Again, I remind that this PR is used to handle the case of #1139 (comment), please let's use one PR to handle one fix, much simple and easy to discuss :) The idea is to make the fraud peers have no real impact on the grin nodes, the mining nodes can continue mining and broadcasting, even it was forced (by fraud peers) into sync state. Without this PR, an attacker can easily find a way to attack Grin network by fake work. |
Let me restate the problem to make sure I understand it correctly and to make sure we are on the same page. Please correct me if I am misunderstanding the problem.
Now let me summarize your proposed fix to make sure I understand it correctly.
The proposed fix changes the behavior such that a node will broadcast blocks during a subsequent sync state, but it does not differentiate between blocks mined by the node and blocks it received via broadcast. We only want to do this for mining nodes and only for blocks mined by that node. Any node finding itself lagging behind the network for any reason will transition to "syncing" to catch up. But will now broadcast out all blocks during sync. For example, a node on a laptop waking after being closed overnight will now broadcast out to all its peers all blocks that it missed as it re-syncs. So to summarize -
The concern around complexity is not in terms of LoC. It is the growing number of possible states the system can be in at any given time and the multiple state permutations we need to be aware of. We are not proposing to add an additional SyncState here, but adding a boolean flag on top of the existing set of sync states. This adds significant complexity across the system. For example the stratum server itself is not aware of this new introduced state flag. We also now have two separate fields within SyncState (current state and synced_once) wrapped in individual RwLock instances. We can no longer guarantee we are updating the sync state atomically and we potentially have new and exciting edge cases here to handle.
Strongly disagree here. We need to approach this in terms of fixing the underlying issue. We need to minimize any impact on unrelated parts of the system. We need to do this while minimizing complexity across the system as a whole. We cannot do this if we try to handle each fix individually, without considering the bigger picture. |
Agreed. And we can't tell whether a node is mining or not. So it means we can't do this. In my original suggestion to differentiate initial syncing state from subsequent syncs, I mean initial syncing from zero. AKA IBD. Not any start.
Also agreed. But I don't think Gary meant ignoring the bigger picture either. The more I think about this, the more I think we shouldn't do anything special while a node switches to sync node, other than trying to sync. Perhaps this check made some sense on testnet(s) but a lot less so now. So I think the simpler, minimal thing that removes the possible exploit is just to keep mining and keep relaying no matter what. |
Updated for most simple logic - "keep mining and keep relaying no matter what" :) But keeping the
for the case of node restart and new node syncing. Otherwise, there will be too much old headers broadcasting from the syncing. Does this make sense? @antiochp @ignopeverell |
@@ -647,8 +643,10 @@ impl ChainAdapter for ChainToPoolAndNetAdapter { | |||
let cb: CompactBlock = b.clone().into(); | |||
self.peers().broadcast_compact_block(&cb); | |||
} else { | |||
// "header first" propagation if we are not the originator of this block | |||
self.peers().broadcast_header(&b.header); | |||
if self.sync_state.is_synced_once() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still not convinced the added complexity of is_synced_once
is worth it. There is no difference between the 1st and 2nd sync for a node.
Can we not just use the existing is_syncing()
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If using existing is_syncing
, the fraudulent most work peer will make the gossip block headers propagation pause (because node is in syncing state), until the node leave syncing mode by a fraud detection and banning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦♂️ Of course, forgot about that.
We should just always broadcast out "header first" here, regardless of syncing or not, regardless of synced_first or not.
Keep it simple and let peers handle these redundant headers as necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worry there will be too much redundant headers then, and not sure whether the current grin version can handle these flood redundant headers, we don't have much time to test it since only less 4 days before launching.
Also here's a solution to kill that fraudulent work case, I don't understand why simple design is more important than defending from attack.
BTW, does this is_synced_once
indeed make design become a very complex one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look like we couldn't agree each other between us about whether always
broadcast out "header first" including initial syncing stage, perhaps it's good to invite more pair of eyes on this review 😄 @ignopeverell what's your opinion on this?
👍 always broadcast new blocks whenever we don't know yet if it is valid, for now. Later and if needed we could avoid
I can create 1. and 2. as issues if wanted. |
@sesam - neither of these are applicable here. We only ever broadcast accepted blocks, i.e. we added them successfully to our chain locally. So nothing "far future" or "near future" will ever make it this far. |
Reviewing the code once more, I don't think we ever want to stop block propagation based on whether we're syncing or not. So all these checked should be removed. What we do want to block is the propagation of a header or block we received through sync. And that's an information we have in |
that's already done in current PR, and all of us agree it.
👍 good idea for using chain::Options, will update it. |
Hold on. After looking at it I realized the options were set just based on |
Use new #2349 to implement this improvement, and obsolete this one. |
Re-pickup #2240
When a well synced node is mining, the new mined block(s) need broadcast, no matter what pulled this node into a sync state again, possibly because of a connected fraud peer with a fake difficulty and height, and not banned yet.
This will protect the grin network from such kind of attacking.