Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new mined block should be broadcasted if a node was synced #2324

Closed
wants to merge 4 commits into from
Closed

new mined block should be broadcasted if a node was synced #2324

wants to merge 4 commits into from

Conversation

garyyu
Copy link
Contributor

@garyyu garyyu commented Jan 10, 2019

Re-pickup #2240

When a well synced node is mining, the new mined block(s) need broadcast, no matter what pulled this node into a sync state again, possibly because of a connected fraud peer with a fake difficulty and height, and not banned yet.

This will protect the grin network from such kind of attacking.

@garyyu garyyu requested a review from ignopeverell January 10, 2019 03:31
@antiochp
Copy link
Member

Is there really a difference between the first sync state and subsequent sync states?

I played with doing something similar a while back but came to the conclusion that this just complicates the state handling.

Scenario 1 -

  • A mining node is restarted after having been shutdown overnight.
  • Needs to sync to catch up with the network.

Scenario 2 -

  • A mining node suddenly sees a peer advertise more work.
  • Needs to sync to catch up with the network.

These scenarios should both be handled the same way I think, particularly the logic in adapters.rs, particularly the txpool reconciliation etc.
With these changes a mining node will effectively not know it is syncing against the network durig block processing unless it has just been restarted.

@garyyu
Copy link
Contributor Author

garyyu commented Jan 10, 2019

These scenarios should both be handled the same way I think

@antiochp this PR is for #1139 (comment)
should be handled in different way I think. Yes?

[Updated]
regarding:

A mining node is restarted after having been shutdown overnight.

Just realize that you want to cover the case of known restart for example a manual command to restart?

prefer to ignore this case to make the thing simple. otherwise, we need record the last time and status into the database.

@antiochp
Copy link
Member

prefer to ignore this case to make the thing simple. otherwise, we need record the last time and status into the database.

I suspect this may be a very common case, at least initially.
There are going to be lots of people trying Grin out early on and I would expect many of them to only run it intermittently (some hobby miners included).

we need record the last time and status into the database.
I don't follow - I'm pushing for less state, not more state.

In the adapter code for accepting a new block we already have -

  • syncing vs. not_syncing
  • mining vs. regular node
  • next vs. fork vs. reorg (block status)

My gut feeling here is adding another state risks adding significant complexity here in ways we do not yet have a good handle on.

  • first time sync vs. regular sync vs. not_syncing

Higher level question - Why does a mining node care if it is syncing or not? Why are we not just building blocks against whatever chain state we currently know about?

Is the question then about if/when to broadcast these potentially stale blocks?
And if so - why not just broadcast the compact block regardless and leave it up to our peers to handle the block or reject it.

A miner can see a peer advertise more work and transition to sync mode in two different scenarios -

  • on startup
  • or while already running

In either case I think we should handle it the same way.

When a well synced node is mining, the new mined block(s) need broadcast, no matter what pulled this node into a sync state again, possibly because of a connected fraud peer with a fake difficulty and height, and not banned yet.

I'm arguing a node cannot be both well synced and in sync state simultaneously - its either syncing or not syncing. We don't know (or care) why its syncing other than a peer advertises more work. And for a period of time (during header sync) we don't know if this is legitimate or fraudulent work.

@garyyu
Copy link
Contributor Author

garyyu commented Jan 10, 2019

I suspect this may be a very common case, at least initially.
There are going to be lots of people trying Grin out early on and I would expect many of them to only run it intermittently (some hobby miners included).

I suppose there's no problem for this, people just wait the initial sync when restart. Same as now.

Higher level question - Why does a mining node care if it is syncing or not? Why are we not just building blocks against whatever chain state we currently know about?
Is the question then about if/when to broadcast these potentially stale blocks?
And if so - why not just broadcast the compact block regardless and leave it up to our peers to handle the block or reject it.

Come back to this PR, if we can simply avoid the case to broadcast stale blocks for an initial running node which is not synced yet, why not? as you see, just one line of code in adapter.rs.

I'm arguing a node cannot be both well synced and in sync state simultaneously - its either syncing or not syncing. We don't know (or care) why its syncing other than a peer advertises more work. And for a period of time (during header sync) we don't know if this is legitimate or fraudulent work.

Again, I remind that this PR is used to handle the case of #1139 (comment), please let's use one PR to handle one fix, much simple and easy to discuss :)

The idea is to make the fraud peers have no real impact on the grin nodes, the mining nodes can continue mining and broadcasting, even it was forced (by fraud peers) into sync state.

Without this PR, an attacker can easily find a way to attack Grin network by fake work.

@antiochp
Copy link
Member

Let me restate the problem to make sure I understand it correctly and to make sure we are on the same page. Please correct me if I am misunderstanding the problem.

  1. A mining node will currently suspend mining and will not broadcast blocks if the node is in "syncing" state.
  2. A node is in "syncing" state if it sees a peer advertise more work, without verifying this to be correct.
  3. A node can trivially put all connected peers into "syncing" state by sending out a hand/shake or ping/pong msg with a large total_difficulty value in it.

Now let me summarize your proposed fix to make sure I understand it correctly.

  1. Prior to a node syncing the first time, do not broadcast any blocks out, regardless of mining or not.
  2. After a node has sync'd once, always broadcast all blocks, regardless of mining or syncing state.

The proposed fix changes the behavior such that a node will broadcast blocks during a subsequent sync state, but it does not differentiate between blocks mined by the node and blocks it received via broadcast.
The proposed fix also changes this behavior for all nodes, not limited to just mining nodes.

We only want to do this for mining nodes and only for blocks mined by that node.

Any node finding itself lagging behind the network for any reason will transition to "syncing" to catch up. But will now broadcast out all blocks during sync.

For example, a node on a laptop waking after being closed overnight will now broadcast out to all its peers all blocks that it missed as it re-syncs.

So to summarize -

  • The proposal fixes the "issue".
  • But it introduces undesirable behavior unrelated to the issue itself.
    • Non-mining nodes are affected (this should only affect mining nodes).
    • Blocks received via broadcast are affected (this should only affect blocks mined by the miner).
  • And does this at the cost of introducing additional complexity.

Come back to this PR, if we can simply avoid the case to broadcast stale blocks for an initial running node which is not synced yet, why not? as you see, just one line of code in adapter.rs.

The concern around complexity is not in terms of LoC. It is the growing number of possible states the system can be in at any given time and the multiple state permutations we need to be aware of.

We are not proposing to add an additional SyncState here, but adding a boolean flag on top of the existing set of sync states. This adds significant complexity across the system. For example the stratum server itself is not aware of this new introduced state flag.

We also now have two separate fields within SyncState (current state and synced_once) wrapped in individual RwLock instances. We can no longer guarantee we are updating the sync state atomically and we potentially have new and exciting edge cases here to handle.


Again, I remind that this PR is used to handle the case of #1139 (comment), please let's use one PR to handle one fix, much simple and easy to discuss :)

Strongly disagree here. We need to approach this in terms of fixing the underlying issue. We need to minimize any impact on unrelated parts of the system. We need to do this while minimizing complexity across the system as a whole. We cannot do this if we try to handle each fix individually, without considering the bigger picture.

@ignopeverell
Copy link
Contributor

We only want to do this for mining nodes and only for blocks mined by that node.

Agreed. And we can't tell whether a node is mining or not. So it means we can't do this. In my original suggestion to differentiate initial syncing state from subsequent syncs, I mean initial syncing from zero. AKA IBD. Not any start.

We cannot do this if we try to handle each fix individually, without considering the bigger picture.

Also agreed. But I don't think Gary meant ignoring the bigger picture either.

The more I think about this, the more I think we shouldn't do anything special while a node switches to sync node, other than trying to sync. Perhaps this check made some sense on testnet(s) but a lot less so now. So I think the simpler, minimal thing that removes the possible exploit is just to keep mining and keep relaying no matter what.

@garyyu
Copy link
Contributor Author

garyyu commented Jan 10, 2019

Updated for most simple logic - "keep mining and keep relaying no matter what" :)

But keeping the is_synced_once state to avoid :

"header first" propagation if we are not the originator of this block

for the case of node restart and new node syncing. Otherwise, there will be too much old headers broadcasting from the syncing.

Does this make sense? @antiochp @ignopeverell

@@ -647,8 +643,10 @@ impl ChainAdapter for ChainToPoolAndNetAdapter {
let cb: CompactBlock = b.clone().into();
self.peers().broadcast_compact_block(&cb);
} else {
// "header first" propagation if we are not the originator of this block
self.peers().broadcast_header(&b.header);
if self.sync_state.is_synced_once() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still not convinced the added complexity of is_synced_once is worth it. There is no difference between the 1st and 2nd sync for a node.
Can we not just use the existing is_syncing() here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If using existing is_syncing, the fraudulent most work peer will make the gossip block headers propagation pause (because node is in syncing state), until the node leave syncing mode by a fraud detection and banning.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦‍♂️ Of course, forgot about that.

We should just always broadcast out "header first" here, regardless of syncing or not, regardless of synced_first or not.
Keep it simple and let peers handle these redundant headers as necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry there will be too much redundant headers then, and not sure whether the current grin version can handle these flood redundant headers, we don't have much time to test it since only less 4 days before launching.

Also here's a solution to kill that fraudulent work case, I don't understand why simple design is more important than defending from attack.

BTW, does this is_synced_once indeed make design become a very complex one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look like we couldn't agree each other between us about whether always broadcast out "header first" including initial syncing stage, perhaps it's good to invite more pair of eyes on this review 😄 @ignopeverell what's your opinion on this?

@sesam
Copy link
Contributor

sesam commented Jan 11, 2019

👍 always broadcast new blocks whenever we don't know yet if it is valid, for now.

Later and if needed we could avoid

  1. broadcasting far-future blocks ignore any block that looks like it is increasing hashing power by a magnitude.
  2. wait with broadcasting near-future blocks, instead first syncing up and validating, and only then broadcast

I can create 1. and 2. as issues if wanted.

@antiochp
Copy link
Member

antiochp commented Jan 11, 2019

@sesam - neither of these are applicable here. We only ever broadcast accepted blocks, i.e. we added them successfully to our chain locally. So nothing "far future" or "near future" will ever make it this far.

@ignopeverell
Copy link
Contributor

Reviewing the code once more, I don't think we ever want to stop block propagation based on whether we're syncing or not. So all these checked should be removed.

What we do want to block is the propagation of a header or block we received through sync. And that's an information we have in block_accepted with the chain::Options argument.

@garyyu
Copy link
Contributor Author

garyyu commented Jan 12, 2019

Reviewing the code once more, I don't think we ever want to stop block propagation based on whether we're syncing or not. So all these checked should be removed.

that's already done in current PR, and all of us agree it.

What we do want to block is the propagation of a header or block we received through sync. And that's an information we have in block_accepted with the chain::Options argument.

👍 good idea for using chain::Options, will update it.

@ignopeverell
Copy link
Contributor

Hold on. After looking at it I realized the options were set just based on is_syncing anyway. I'm working on another PR that adds tracking of hashes we requested to the TrackingAdapter so we can know for sure it's not a hash that was sent through gossip, but one we asked for.

@garyyu
Copy link
Contributor Author

garyyu commented Jan 12, 2019

Use new #2349 to implement this improvement, and obsolete this one.

@garyyu garyyu closed this Jan 12, 2019
@garyyu garyyu deleted the broadcast branch January 13, 2019 07:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants