Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix cached level DB in in consensus #845

Open
sergiy-skalelabs opened this issue Aug 30, 2022 · 1 comment
Open

Fix cached level DB in in consensus #845

sergiy-skalelabs opened this issue Aug 30, 2022 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@sergiy-skalelabs
Copy link
Contributor

The CacheLevelDB class needs review and improvement because of helgrind reports about serious multi-threading problems like unlocked an invalid lock, associated lock is not held by any thread and unlocked a not-locked lock at. This class contains CacheLevelDB::checkForDeadLockRead() which violates lock based data protection strategy:

  • it unlocks synchronization object in other thread than it was initially locked
  • it unlocks already unlocked synchronization object
  • it unlocks dead synchronization object
  • it violates normal design of std::lock_guard and std::unique_lock strategy
    Example of call stack 1:
==23347== Thread skalenetwork/IMA#55 unlocked an invalid lock at 0x19D713F8
==23347==    at 0x4D9CCD6: ??? (in /usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so)
==23347==    by 0x2211FF1: std::__glibcxx_rwlock_unlock(pthread_rwlock_t*) (shared_mutex:75)
==23347==    by 0x2212049: std::__shared_mutex_pthread::unlock() (shared_mutex:206)
==23347==    by 0x22120A5: std::__shared_mutex_pthread::unlock_shared() (shared_mutex:246)
==23347==    by 0x25D1E9F: std::shared_timed_mutex::unlock_shared() (shared_mutex:474)
==23347==    by 0x25C7C76: CacheLevelDB::checkForDeadLockRead(char const*) (CacheLevelDB.cpp:183)
==23347==    by 0x25C99C7: CacheLevelDB::readPrefixRange(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) (CacheLevelDB.cpp:296)
==23347==    by 0x25EC561: MsgDB::getMessages(block_id) (MsgDB.cpp:87)
==23347==    by 0x24BF075: Schain::ifIncompleteConsensusDetectedRestartAndRebroadcastAllMessagesForCurrentBlock() (Schain.cpp:970)
==23347==    by 0x24BE125: Schain::bootstrap(block_id, unsigned long, unsigned long) (Schain.cpp:953)
==23347==    by 0x23A6C1D: ConsensusEngine::bootStrapAll() (ConsensusEngine.cpp:559)
==23347==    by 0x1FC2D8E: SkaleHost::startWorking()::{lambda()#1}::operator()() const (SkaleHost.cpp:755)

Example of call stack 2:

==23347== Thread skalenetwork/IMA#73 unlocked a not-locked lock at 0x1A285E08
==23347==    at 0x4D9CCD6: ??? (in /usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so)
==23347==    by 0x2211FF1: std::__glibcxx_rwlock_unlock(pthread_rwlock_t*) (shared_mutex:75)
==23347==    by 0x2212049: std::__shared_mutex_pthread::unlock() (shared_mutex:206)
==23347==    by 0x22120A5: std::__shared_mutex_pthread::unlock_shared() (shared_mutex:246)
==23347==    by 0x25D1E9F: std::shared_timed_mutex::unlock_shared() (shared_mutex:474)
==23347==    by 0x25C7C76: CacheLevelDB::checkForDeadLockRead(char const*) (CacheLevelDB.cpp:183)
==23347==    by 0x25C6C1A: CacheLevelDB::readString(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) (CacheLevelDB.cpp:120)
==23347==    by 0x25EE8D8: PriceDB::readPrice(block_id) (PriceDB.cpp:58)
==23347==    by 0x267DE79: PricingAgent::readPrice(block_id) (PricingAgent.cpp:105)
==23347==    by 0x24F08DF: Schain::getPriceForBlockId(unsigned long) (SchainGettersSetters.cpp:236)
==23347==    by 0x23AD5BF: ConsensusEngine::getPriceForBlockId(unsigned long) const (ConsensusEngine.cpp:906)
==23347==    by 0x1FC4CC2: SkaleHost::getGasPrice() const (SkaleHost.cpp:906)
==23347==  Lock at 0x1A285E08 was first observed
==23347==    at 0x4D9C5DF: ??? (in /usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so)
==23347==    by 0x2211FC2: std::__glibcxx_rwlock_wrlock(pthread_rwlock_t*) (shared_mutex:73)
==23347==    by 0x2212017: std::__shared_mutex_pthread::lock() (shared_mutex:186)
==23347==    by 0x25D1E3F: std::shared_timed_mutex::lock() (shared_mutex:459)
==23347==    by 0x25D2587: std::lock_guard<std::shared_timed_mutex>::lock_guard(std::shared_timed_mutex&) (std_mutex.h:159)
==23347==    by 0x25C7E46: CacheLevelDB::writeString(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool) (CacheLevelDB.cpp:202)
==23347==    by 0x25EF20D: PriceDB::savePrice(boost::multiprecision::number<boost::multiprecision::backends::cpp_int_backend<256u, 256u, (boost::multiprecision::cpp_integer_type)0, (boost::multiprecision::cpp_int_check_type)0, void>, (boost::multiprecision::expression_template_option)0> const&, block_id) (PriceDB.cpp:83)
==23347==    by 0x267D9F3: PricingAgent::savePrice(boost::multiprecision::number<boost::multiprecision::backends::cpp_int_backend<256u, 256u, (boost::multiprecision::cpp_integer_type)0, (boost::multiprecision::cpp_int_check_type)0, void>, (boost::multiprecision::expression_template_option)0>, block_id) (PricingAgent.cpp:98)
==23347==    by 0x267D3F4: PricingAgent::calculatePrice(std::vector<std::vector<unsigned char, std::allocator<unsigned char> >, std::allocator<std::vector<unsigned char, std::allocator<unsigned char> > > > const&, unsigned long, unsigned int, block_id) (PricingAgent.cpp:84)
==23347==    by 0x24BE092: Schain::bootstrap(block_id, unsigned long, unsigned long) (Schain.cpp:943)
==23347==    by 0x23A6C1D: ConsensusEngine::bootStrapAll() (ConsensusEngine.cpp:559)
==23347==    by 0x1FC2D8E: SkaleHost::startWorking()::{lambda()#1}::operator()() const (SkaleHost.cpp:755)

The test was performed by running skaled like valgrind --tool=helgrind ./skaled --colors --config ./config0.json -v 4 --log-value-size-limit 1024000 --performance-timeline-enable --performance-timeline-max-items=16000000 --ssl-key ./key.pem --ssl-cert ./cert.pem
The problem described in this task is top most occurring and repeating while running under helgrind analysis.

@kladkogex kladkogex added this to the 2.1 milestone Aug 30, 2022
@kladkogex
Copy link
Contributor

Thank you! Will be fixed soon

@PolinaKiporenko PolinaKiporenko removed this from the 2.1 milestone Sep 23, 2022
@DmytroNazarenko DmytroNazarenko added the bug Something isn't working label Apr 10, 2023
@DmytroNazarenko DmytroNazarenko transferred this issue from skalenetwork/IMA Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: No status
Development

No branches or pull requests

4 participants