Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(op-batcher): altda->ethda failover #13

Open
wants to merge 2 commits into
base: feat--multiframe-altda-channel
Choose a base branch
from

Conversation

samlaf
Copy link
Collaborator

@samlaf samlaf commented Nov 7, 2024

Plan is to upstream this to op's repo. Created here to get review from the team first.

This PR builds on top of the feat--multiframe-altda-channel changes (already upstreamed to op's repo, waiting for review there)

It contains 2 commits:

  1. failover test: 61d7578
  2. failover logic: 5712618

Right now failover is done to calldata txs because that was trivial whereas failing over to blobs or their auto mode that switches between blobs and calldata would need a nontrivial refactor and some thinking. Not sure its worth putting effort into this atm given that the whole point of failover is that it should happen very rarely and also not last very long, but let me know if you guys think otherwise.

@samlaf samlaf marked this pull request as draft November 7, 2024 21:19
@samlaf samlaf force-pushed the samlaf/feat--op-batcher-altda-failover-to-ethda branch from 50e492c to 80728ec Compare November 8, 2024 14:50
@samlaf samlaf force-pushed the samlaf/feat--op-batcher-altda-failover-to-ethda branch from adfa7ce to 61d7578 Compare November 11, 2024 06:16
@samlaf samlaf changed the base branch from develop to feat--multiframe-altda-channel November 11, 2024 07:07
@samlaf samlaf marked this pull request as ready for review November 11, 2024 07:07
@bxue-l2
Copy link
Collaborator

bxue-l2 commented Nov 12, 2024

At the high level, it usually takes us about 1 hour to fix the problem

@@ -130,6 +134,10 @@ func (s *FakeDAServer) HandleGet(w http.ResponseWriter, r *http.Request) {

func (s *FakeDAServer) HandlePut(w http.ResponseWriter, r *http.Request) {
time.Sleep(s.putRequestLatency)
if s.failoverCount > 0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the point to decrement failoverCount, then actually handle the put

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it just to simplify testing?

)

// TestBatcher_FailoverToEthDA_FallbackToAltDA tests that the batcher will failover to ethDA
// if the da-server returns 503, and then fallback to altDA once altDA is available again
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we always try altda first for every dispersal, then retry for non-altDA after sufficient retry. Wording seems odd.


countEthDACommitment := uint64(0)

// Most likely, sequence of blocks will be: altDA, ethDA, ethDA, altDA, altDA, altDA.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why most likely? And why two ethDA in a row, because we set failoverCount=2, but why altDA in the beginning?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants