You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We (MongoDB) have seen behavior in recent versions of Jepsen that cause test failures in about 60% of runs due to odd clock skew. Specifically, the clock is incorrect between runs, which causes subsequent apt-update commands in the containers to fail due to certificate validation failure. Our Jepsen tests run with clock skew disabled, so we aren't sure why the clock is being changed, but it looks like the recent change to make test node containers privileged and have ALL capabilities is how it's happening.
Interestingly, this only seems to be a problem on test hosts that have an NTP client running. When we run the tests on our virtual workstations, which do not have an NTP client running, the tests succeed. It seems that the clock skew in the test node containers is racing with the NTP client somehow, which causes the observed failures. However, as stated above, we have not been able to determine so far why the clock skew occurs in the first place.
The text was updated successfully, but these errors were encountered:
We (MongoDB) have seen behavior in recent versions of Jepsen that cause test failures in about 60% of runs due to odd clock skew. Specifically, the clock is incorrect between runs, which causes subsequent
apt-update
commands in the containers to fail due to certificate validation failure. Our Jepsen tests run with clock skew disabled, so we aren't sure why the clock is being changed, but it looks like the recent change to make test node containersprivileged
and haveALL
capabilities is how it's happening.Interestingly, this only seems to be a problem on test hosts that have an NTP client running. When we run the tests on our virtual workstations, which do not have an NTP client running, the tests succeed. It seems that the clock skew in the test node containers is racing with the NTP client somehow, which causes the observed failures. However, as stated above, we have not been able to determine so far why the clock skew occurs in the first place.
The text was updated successfully, but these errors were encountered: