Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Headers fetching via NeoFS BlockFetcher service #3789

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

AliceInHunterland
Copy link
Contributor

Close #3574

Signed-off-by: Ekaterina Pavlova <[email protected]>
@AliceInHunterland AliceInHunterland force-pushed the headers branch 2 times, most recently from ee083c5 to 6b9d29e Compare January 20, 2025 15:52
Copy link
Member

@AnnaShaleva AnnaShaleva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HeadersFetcher is started in the wrong place. Ensure that nothing prevents StateSync module to request blocks/headers during the node start.

queue: make([]*block.Block, cacheSize),
checkBlocks: make(chan struct{}, 1),
headersQueue: make([]*block.Header, cacheSize),
checkHeaders: make(chan struct{}, 1),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need this headers/blocks split inside the queue. What we need is a type-specific queue, it may be implemented via generics, because overall logic is the same for blocks/headers.

On server side we may use two instances of the queue: one for blocks, and another one for headers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's have the current implementation for now and finalize the overall process firstly; afterwards we'll refactor blockqueue.

pkg/network/server.go Outdated Show resolved Hide resolved
pkg/network/server.go Outdated Show resolved Hide resolved
pkg/network/server.go Outdated Show resolved Hide resolved
var err error
s.blockFetcher, err = blockfetcher.New(chain, s.NeoFSBlockFetcherCfg, log, s.bFetcherQueue.PutBlock,
sync.OnceFunc(func() { close(s.blockFetcherFin) }))
s.headerFetcher, err = blockfetcher.New(chain, s.NeoFSBlockFetcherCfg, log, s.bFetcherQueue.PutBlock, s.bFetcherQueue.PutHeader,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Provide s.stateSync instead of chain as the first argument to blockfetcher.New.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The height is different for s.stateSync. It starts from 3 mln, so the block/header can persist, and the queue becomes locked. That's why I rolled it back to the chain. Why can't it be the chain? Blockfetcher uses it only for config and BlockHeight(). will recheck why its s.stateSync starts from the middle of the sequence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need BlockHeight and HeaderHeight to be aligned with statesync module vision, that's the main idea.


billet *mpt.Billet

jumpCallback func(p uint32) error
}

// NewModule returns new instance of statesync module.
func NewModule(bc Ledger, stateMod *stateroot.Module, log *zap.Logger, s *dao.Simple, jumpCallback func(p uint32) error) *Module {
func NewModule(bc Ledger, log *zap.Logger, s *dao.Simple, jumpCallback func(p uint32) error) *Module {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert this change.

Copy link
Member

@AnnaShaleva AnnaShaleva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the service start work as expected?

@@ -285,9 +285,6 @@ func NewBlockchain(s storage.Store, cfg config.Blockchain, log *zap.Logger) (*Bl
zap.Uint32("MaxValidUntilBlockIncrement", cfg.MaxValidUntilBlockIncrement))
}
if cfg.P2PStateExchangeExtensions {
if !cfg.StateRootInHeader {
return nil, errors.New("P2PStatesExchangeExtensions are enabled, but StateRootInHeader is off")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need more sophisticated check. If BlockFetcher is on, then it's OK, but if Blockfetcher is off and StateRootInHeader is also off, then it's an error.

Comment on lines 822 to 894
if s.chain.HeaderHeight() < p.LastBlockIndex() {
return s.requestHeaders(p)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if check must be adjusted. Right now it's related to peers only, but if blockfetcher is used, then we don't care about peer height because another source of data is used (NeoFS storage).

if s.stateSync.NeedHeaders() {
if s.chain.HeaderHeight() < p.LastBlockIndex() {
return s.requestHeaders(p)
}
return nil
}
s.headerFetcher.Shutdown()
if s.ServerConfig.NeoFSBlockFetcherCfg.Enabled {
err := s.blockFetcher.Start()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blockfetcher should fetch exactly the range of blocks that is required by statesymc module if statesync module is enabled. If statesync module is disabled, then blockfetcher should fetch all blocks starting from 0. Check that this logic is present in the blockfetcher.

Comment on lines 858 to 922
if s.ServerConfig.NeoFSBlockFetcherCfg.Enabled {
err := s.headerFetcher.Start()
if err != nil {
s.log.Error("skipping NeoFS BlockFetcher", zap.Error(err))
}
}
if s.headerFetcher.IsActive() {
return nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition may lead to deadlock, consider the situation: NeoFS storage is missing the latest headers. Once started, headerfetcher will fetch all available headers and shutdown. However, if some latest headers are still required by statesync, then we need to switch to P2P. Right now this code will restart headersfetcher anyway. Setting s.headerFetcher to nil after the first headerfetcher shutdown (and checking for nil in dependent places) may do the trick.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with nil we cant use isActive.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without nil we can't distinguish finished service from not started. So additional nil checks is a trade-off.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added isShutdown to the blockfetcher. if you don't like it, will look more precisely to the nil.

Comment on lines 265 to 271
err = bfs.enqueueBlock(b)
if !bfs.cfg.BlocksOnly {
err = bfs.enqueueHeader(&b.Header)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the code that will read header from stream without transactions reading.

Comment on lines 230 to 231
s.bFetcherQueue = bqueue.New(chain, log, nil, s.NeoFSBlockFetcherCfg.BQueueSize, updateBlockQueueLenMetric, bqueue.Blocking)
s.bFetcherQueue = bqueue.New(s.stateSync, log, nil, s.NeoFSBlockFetcherCfg.BQueueSize, updateBlockQueueLenMetric, bqueue.Blocking)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need a switch here. The main idea is that blockfetcher should work both with statesync module enabled/disabled. If statesync module is disabled then we need a usual blockfetcher operation flow that will fetch blocks, put them directly into chain-backed queue and shutdown. If statesync module is enabled then blockfetcher should use statesync-backed queue and headersfetcher should also be started.

Copy link

codecov bot commented Jan 21, 2025

Codecov Report

Attention: Patch coverage is 31.04478% with 231 lines in your changes missing coverage. Please review.

Project coverage is 82.82%. Comparing base (64c4de4) to head (f3629c4).
Report is 18 commits behind head on master.

Files with missing lines Patch % Lines
pkg/network/bqueue/queue.go 26.78% 82 Missing ⚠️
pkg/services/blockfetcher/blockfetcher.go 18.60% 69 Missing and 1 partial ⚠️
pkg/network/server.go 12.16% 55 Missing and 10 partials ⚠️
pkg/core/statesync/module.go 76.27% 13 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3789      +/-   ##
==========================================
- Coverage   83.13%   82.82%   -0.32%     
==========================================
  Files         335      335              
  Lines       46855    47186     +331     
==========================================
+ Hits        38954    39082     +128     
- Misses       6311     6505     +194     
- Partials     1590     1599       +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@AliceInHunterland
Copy link
Contributor Author

sometimes:

2025-01-21T23:53:55.597+0300    INFO    persisted to disk       {"blocks": 0, "keys": 1543, "headerHeight": 625633, "blockHeight": 0, "took": "15.490041ms"}
^C2025-01-21T23:53:56.155+0300  INFO    shutting down server    {"peers": 18}
2025-01-21T23:53:56.155+0300    INFO    shutting down NeoFS BlockFetcher service        {"force": true}
panic: send on closed channel

goroutine 533 [running]:
github.com/nspcc-dev/neo-go/pkg/network/bqueue.(*Queue).PutHeader(0x140000a5d40, 0x140050b9d40)
        github.com/nspcc-dev/neo-go/pkg/network/bqueue/queue.go:301 +0x228
github.com/nspcc-dev/neo-go/pkg/services/blockfetcher.(*Service).blockDownloader(0x140018cdce0)
        github.com/nspcc-dev/neo-go/pkg/services/blockfetcher/blockfetcher.go:314 +0x408
created by github.com/nspcc-dev/neo-go/pkg/services/blockfetcher.(*Service).Start in goroutine 171
        github.com/nspcc-dev/neo-go/pkg/services/blockfetcher/blockfetcher.go:229 +0x62c

return nil, fmt.Errorf("failed to create NeoFS BlockFetcher: %w", err)
}
s.NeoFSBlockFetcherCfg.BlocksOnly = true
s.blockFetcher, err = blockfetcher.New(bq, s.NeoFSBlockFetcherCfg, log, s.bFetcherQueue.PutBlock, s.bFetcherQueue.PutHeader, sync.OnceFunc(func() { close(s.blockFetcherFin) }))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I don't see in the current implementation is: when statesync part is over, the node should start its usual sync process:

  1. Start the third blockfetcher instance backed by a simple queue (not by statesync module) and check if there are blocks from NeoFS that can be fetched.
  2. Once all blocks are fetched from NeoFS, move to P2P sync.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement headers fetching via NeoFS BlockFetcher service
2 participants