-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
services: fix logging data race after shutdown #3307
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #3307 +/- ##
==========================================
+ Coverage 84.94% 84.98% +0.04%
==========================================
Files 328 328
Lines 44754 44789 +35
==========================================
+ Hits 38016 38066 +50
+ Misses 5225 5214 -11
+ Partials 1513 1509 -4 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something has happend with the commit message indent.
2dee68e
to
924f0d5
Compare
ffd1e0b
to
532a48d
Compare
532a48d
to
0dff702
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note about calls to zap log Sync
: technically, all our services share the same logger instance and there's no need to call Sync
manually from each service, it may be done only once from the node caller's code. However, Sync
calls don't hurt us, and there's still an ability to provide custom logger to every service constructor, and thus, we'd better sync logs for every service separately.
@AliceInHunterland, rework the second commit message and PR title/description: describe what commit actually does, i.e. proper logs flush after services shutdown. It is needed because this PR closes not only #2973, but also a set of other logs-related issues. Search through the set of tests related issues and pick ones that will be closed by this PR. Add references to these issues to the commit message and PR description. |
0dff702
to
8631889
Compare
2024-02-14T09:47:23.5366050Z testing.go:1465: race detected during execution of test |
a52c5c4
to
92174fb
Compare
72fc814
to
88140c5
Compare
04e6e21
to
f93304f
Compare
OK, one more problem found:
What we need is: in a separate commit move blockchain/mempool subscriptions of all dependent services from neo-go/pkg/services/stateroot/validators.go Lines 32 to 33 in 327e766
|
f93304f
to
5e3b606
Compare
5e3b606
to
bbd6454
Compare
Some problems with tests, do they connected with the latest network server Start refactoring?
|
OK, not merging it for now, I need to carefully review the updated code, something is wrong. |
also sometimes :
getting |
OK, here's the problem, and it's in the second commit: there's a race between consensus service blocks subscription and first block announcement. Successful test run assume that consensus service is subscribed firstly, and only after that the first block is being announced by Blockchain. In this case consensus service is able to properly update dBFT state and move to the next height:
Failed test run happens when block 1 is being announced by the blockchain prior to consensus service subscription, and hence, consensus service don't have any way to react on the accepted block (it's not subscribed to blocks yet):
|
Previously user should Start server in a separate goroutine. Now separate goroutine is created inside the Start(). For normal server operation, the caller should wait for Start to finish. Also, fixed TestTryInitStateSync test which was exiting earlier than logs are called. Close #3112 Signed-off-by: Ekaterina Pavlova <[email protected]>
Start of some services is bound to blockchain subscriptions, and thus, can't be run before the blockchain notifications dispatcher. Signed-off-by: Ekaterina Pavlova <[email protected]>
Use started atomic.Bool field to ensure that the node server shutdown procedure is executed only once. Prevent the following panic caused by server double-shutdown in testing code: ``` --- FAIL: TestServerRegisterPeer (0 .06s) panic: closed twice goroutine 60 [running]: testing.tRunner.func1.2({0x104c40b20, 0x104d0ec90}) /opt/homebrew/opt/go/libexec/src/testing/testing.go:1545 +0x1c8 testing.tRunner.func1() /opt/homebrew/opt/go/libexec/src/testing/testing.go:1548 +0x360 panic({0x104c40b20?, 0x104d0ec90?}) /opt/homebrew/opt/go/libexec/src/runtime/panic.go:914 +0x218 github.com/nspcc-dev/neo-go/pkg/network.(*fakeTransp).Close (0x14000159e08?) /Users/ekaterinapavlova/Workplace/neo-go/pkg/network /discovery_test.go:83 +0x54 github.com/nspcc-dev/neo-go/pkg/network.(*Server).Shutdown (0x14000343400) /Users/ekaterinapavlova/Workplace/neo-go/pkg/network/server.go:299 +0x104 github.com/nspcc-dev/neo-go/pkg/network.startWithCleanup.func1() /Users/ekaterinapavlova/Workplace/neo-go/pkg/network/server_test .go:408 +0x20 testing.(*common).Cleanup.func1() /opt/homebrew/opt/go/libexec/src/testing/testing.go:1169 +0x110 testing.(*common).runCleanup(0x1400032c340, 0x14000159d80?) /opt/homebrew/opt/go/libexec/src/testing/testing.go:1347 +0xd8 testing.tRunner.func2() /opt/homebrew/opt/go/libexec/src/testing/testing.go:1589 +0x2c testing.tRunner(0x1400032c340, 0x104d0c5d0) /opt/homebrew/opt/go/libexec/src/testing/testing.go:1601 +0x114 created by testing.(*T).Run in goroutine 1 /opt/homebrew/opt/go/libexec/src/testing/testing.go:1648 +0x33c ``` Signed-off-by: Ekaterina Pavlova <[email protected]>
Signed-off-by: Ekaterina Pavlova <[email protected]>
Added sync logs for every service separately to provide the ability to have a custom logger for each service. This commit makes the code follow the zap usages rules: `Sync calls the underlying Core's Sync method, flushing any buffered log entries. Applications should take care to call Sync before exiting.` Signed-off-by: Ekaterina Pavlova <[email protected]>
s.Shutdown() does not wait for all goroutines of the node server to finish normally just because the server exits without dependent goroutines awaiting. Which causes logs to attempt to write after the test has ended. The consequence of this bug fix is that corresponding tests are fixed. Close #2973 Close #2974 Signed-off-by: Ekaterina Pavlova <[email protected]>
bbd6454
to
5cf0c75
Compare
got similar to #2977 (comment) on windows Run tests (windows-2022, 1.21) -probably not connected with current changes
|
It's not related to this PR, create a separate issue or add these logs to an existing one if it's the same problem. |
The problem is writing to logs after the end of the test execution.
Added sync logs for every service separately to provide the ability to have a custom logger for each service.
Data race is caused by missing synchronization channels in the network server. Shutdown doesn't wait for all goroutines to finish properly.
Close #2973
Close #2974
Close #3112