Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to bypass validation checks for publishBlockV2 #451

Conversation

StefanBratanov
Copy link
Contributor

@StefanBratanov StefanBratanov commented May 31, 2024

When creating blocks locally, most of the gossip checks would have already been covered by the process of creating the block itself (at least in Teku), so it makes sense to add an option to bypass validation checks altogether and publish the block immediately. This way there would be no additional delays (even if small) of broadcasting the block to the network when using v2 for publishing.

@nflaig
Copy link
Member

nflaig commented May 31, 2024

When creating blocks locally, most of the gossip checks would have already been covered by the process of creating the block itself (at least in Teku)

This is only true if the block is published via the same beacon node that created it, in which case running gossip validation again does not make much sense. In a multi node setup, does this mean the validator client would have to keep track of which node produces the block and to which it is published, and set broadcast_validation accordingly?

In Lodestar, we already skip gossip validation if the block is published to the same beacon node but it's not decided by the validator client but rather the beacon node which has a cache of produced block roots, and if it was produced locally, we skip gossip validation.

@StefanBratanov
Copy link
Contributor Author

When creating blocks locally, most of the gossip checks would have already been covered by the process of creating the block itself (at least in Teku)

This is only true if the block is published via the same beacon node that created it, in which case running gossip validation again does not make much sense. In a multi node setup, does this mean the validator client would have to keep track of which node produces the block and to which it is published, and set broadcast_validation accordingly?

In Lodestar, we already skip gossip validation if the block is published to the same beacon node but it's not decided by the validator client but rather the beacon node which has a cache of produced block roots, and if it was produced locally, we skip gossip validation.

I agree skipping validation in the beacon node which created it is the valid scenario in this case. But I can argue that it also makes sense in a multi-node setup. The beacon node which created the block would have already published the block, so if for some reason, other beacon nodes fail the gossip validation, the result would be that the block would just be propagated bit slower and they can get downscored a little bit in those rare cases. Ultimately, I think it's nice to have the option to not do any validation on the VC level. Since it's not a default option, clients can choose to use it however they want.

@tersec
Copy link
Contributor

tersec commented May 31, 2024

What is the magnitude of these validation delays?

@michaelsproul
Copy link
Contributor

In Lighthouse I'm pretty sure the delay from gossip verification is trivial (around ~5ms). For us it was the code complexity of adding gossip verification and dealing with duplicate caches that was hard. But we can't remove this complexity while gossip verification is an option, so IMO we may as well keep it by default

Also open to adding the no_verification param if it is beneficial in clients. I think we just need to know if we're hunting performance or simplicity

@arnetheduck
Copy link
Contributor

Ditto nimbus, it's (very) fast and nowadays just part of the "incoming external data" flow that applies to anything going out on gossip (and into the block database) whether that be re-broadcasting or rest-based data - even if a flag was added, it is likely we would do the checks anyway simply because it's more simple this way and provides a cheap bug safety net.

@StefanBratanov
Copy link
Contributor Author

StefanBratanov commented Jun 1, 2024

What is the magnitude of these validation delays?

In Teku, gossip validation seem to average around ~11ms (on a highish spec node). But occasionally, there are some outliers which take longer. For us specifically, for local blocks, we would like to skip it altogether and would be cleaner to do on the VC side. But if for other clients, there is no value in having an option to do no validation for this endpoint, I am happy to implement similar functionality as Lodestar that @nflaig mentioned.

@nflaig
Copy link
Member

nflaig commented Jun 1, 2024

For us specifically, for local blocks, we would like to skip it altogether and would be cleaner to do on the VC side. But if for other clients, there is no value in having an option to do no validation for this endpoint, I am happy to implement similar functionality as Lodestar that @nflaig mentioned.

Imo doesn't hurt to have this option, even though we will not be using this in production, we had a use case for it during testing and Lodestar already supports a none value, it can only be configured internally right now and not via CLI. If we keep gossip as default, I don't see an issue other than clients not supporting it right now, if we wanna be strict about it would require to bump the api, but that seems a bit overkill to me.

@StefanBratanov
Copy link
Contributor Author

For us specifically, for local blocks, we would like to skip it altogether and would be cleaner to do on the VC side. But if for other clients, there is no value in having an option to do no validation for this endpoint, I am happy to implement similar functionality as Lodestar that @nflaig mentioned.

Imo doesn't hurt to have this option, even though we will not be using this in production, we had a use case for it during testing and Lodestar already supports a none value, it can only be configured internally right now and not via CLI. If we keep gossip as default, I don't see an issue other than clients not supporting it right now, if we wanna be strict about it would require to bump the api, but that seems a bit overkill to me.

I like none better than no_validation. Modified the PR.

apis/beacon/blocks/blinded_blocks.v2.yaml Outdated Show resolved Hide resolved
apis/beacon/blocks/blocks.v2.yaml Outdated Show resolved Hide resolved
@tersec
Copy link
Contributor

tersec commented Jun 1, 2024

Imo doesn't hurt to have this option, even though we will not be using this in production, we had a use case for it during testing and Lodestar already supports a none value, it can only be configured internally right now and not via CLI. If we keep gossip as default, I don't see an issue other than clients not supporting it right now, if we wanna be strict about it would require to bump the api, but that seems a bit overkill to me.

What's the utility of specifying an option only useful for internal testing as part of the beacon API?

Also, to Jacek's point, a bit more concretely, Nimbus routes all these functions through (for those who might want to see details) routeSignedBeaconBlock, which does this validation as an integral part of block sending, separately from the specifics of any given beacon API function. While it's possible to adds a bypass of this, it would indeed complicate the funnel-everything-into-uniform-processing design for little apparent gain.

So far this PR seems surprisingly unmotivated by specific scenarios or use cases to warrant this, and handwaving about, well it's harmless or not that complicated ("I don't see an issue other than clients not supporting it right now") still creates a situation where there's an eventual expectation to implement this.

I'll be a bit more pointed: even if it's 20ms or 30ms, say, why are people waiting to send blocks late enough any of this matters?

(If it's because that's what MEV timing game players want, well.)

@nflaig
Copy link
Member

nflaig commented Jun 1, 2024

So far this PR seems surprisingly unmotivated by specific scenarios or use cases to warrant this, and handwaving about, well it's harmless or not that complicated ("I don't see an issue other than clients not supporting it right now") still creates a situation where there's an eventual expectation to implement this.

I didn't wanna imply "not that complicated" to implement, from this PR, it seems for Teku it would be lower complexity to adopt it via query param instead of implementing something like Lodestar. For Nimbus, it seems higher complexity which is totally fair. That's why we have this discussion, to get a better picture / different perspectives and trade offs.

I'll be a bit more pointed: even if it's 20ms or 30ms, say, why are people waiting to send blocks late enough any of this matters?

I am not running Teku in my setup right now, but if I were I would like my locally built blocks to be gossiped faster if there is no clear downside to it. We shouldn't avoid optimizations to local block building just because most of the network is using MEV-boost, there are still a lot of people (especially solo stakers) which do not use it or have really high min-bid, and for those, reducing the internal latency is even more important as they might not have the best internet connection.

@StefanBratanov
Copy link
Contributor Author

One of the main inspirations for us was that v1 would eventually be deprecated and we want to keep the same functionality as of today for locally created blocks whilst still using v2 (Teku VC still uses v1).

@tersec
Copy link
Contributor

tersec commented Jun 1, 2024

One of the main inspirations for us was that v1 would eventually be deprecated and we want to keep the same functionality as of today for locally created blocks whilst still using v2 (Teku VC still uses v1).

Can't speak to specifics of non-Nimbus implementations on how exactly the validation is handled here, but https://ethereum.github.io/beacon-APIs/?urls.primaryName=v2.5.0#/Beacon/publishBlock states that:

Instructs the beacon node to broadcast a newly signed beacon block to the beacon network, to be included in the beacon chain. A success response (20x) indicates that the block passed gossip validation and was successfully broadcast onto the network. The beacon node is also expected to integrate the block into state, but may broadcast it before doing so, so as to aid timely delivery of the block. Should the block fail full validation, a separate success response code (202) is used to indicate that the block was successfully broadcast but failed integration. After Deneb, this additionally instructs the beacon node to broadcast all given blobs.

Emphases added.

As specified, publishBlock (V1, retroactively) already involves validation. Some BNs may broadcast it before doing so, but that's framed as a non-default, optional behavior.

@tbenr
Copy link
Contributor

tbenr commented Jun 3, 2024

Thanks all for the good feedback! We internally started the conversation on this topic several times, without reaching an actual decision. But reading throughout the thread I'm know lean to not having this option but instead leave the BN to responsible for eventual optimizations (like caching ala Lodestar implementation @nflaig).

Question: how you deal with [IGNORE] The block is not from a future slot rule? Are you considering this as a gossip validation failure? It won't be a problem if we add the "local built blocks cache", but in a multi-BN scenario, if VC deals only with lagging BNs it might lose the proposal.

My current feeling is to leave it as a validation failure (current teku behaviour). In general all IGNORE are considered failure with the exception of "already seen block".

@nflaig
Copy link
Member

nflaig commented Jun 3, 2024

Question: how you deal with [IGNORE] The block is not from a future slot rule? Are you considering this as a gossip validation failure?

part of gossip validation in Lodestar and considered a failure

In general all IGNORE are considered failure with the exception of "already seen block".

that's our current logic as well

@StefanBratanov
Copy link
Contributor Author

StefanBratanov commented Jun 3, 2024

Thanks for the discussion all. :) We will implement similar functionality as Lodestar, so none validation option wouldn't be necessary. If at some point, clients think it could be useful option, we can always refer to this PR. Closing for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants