-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[system test] make Recovery tests run only on KRaft mode only #10637
base: main
Are you sure you want to change the base?
Conversation
Why would you want to run them in ZooKeeper mode only? |
Well, some of the test cases do not work with KRaft here (e.g., |
010b8d8
to
6d2db69
Compare
So, should it be fixed? Extended? Deleted? ZooKeeper will be gone soon and any ZooKeeper only tests will be deleted with it. |
I mean if those tests could run also on KRaft maybe we should update them...I think @henryZrncik and @fvaleri were trying to somehow fix it but I am not sure if that's possible (it was something related to UTO). |
Right, but we need to understand what exactly the problem is. UTO is now used everywhere. So not sure why would that make a difference Zoo versus KRaft. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for PR!
IMHO Changes in the RecoveryST
would need a few more steps to run correctly.
Regarding NamespaceDeletionRecovery
as this tests represent steps done in documentation it would be nice to keep these tests (referencing to removing zookeeper related tests) it would be probably nice to find out what is the cause of problem in kraft after you resolved problem with incorrect zookeeper configuration.
@@ -71,7 +66,6 @@ void testRecoveryFromKafkaStrimziPodSetDeletion() { | |||
} | |||
|
|||
@IsolatedTest("We need for each test case its own Cluster Operator") | |||
@KRaftNotSupported("Zookeeper is not supported by KRaft mode and is used in this test class") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
regarding allowing tests in kraft:
- tests with services do not make much sense in kraft as these services are no longer present.
- test of pod sets would make sense but it would still fail as currently we target specifically zookeeper so this needs adjustment to target rather controller by selector. e.g. to target strimzi pod set for controller
my-cluster-7-c-9aa820e3
instead of zookeeper it expects something likemy-cluster-7-zookeeper
.
resourceManager.createResourceWithWait( | ||
NodePoolsConverter.convertNodePoolsIfNeeded( | ||
KafkaNodePoolTemplates.brokerPoolPersistentStorage(testStorage.getNamespaceName(), testStorage.getBrokerPoolName(), testStorage.getClusterName(), 3) | ||
.editSpec() | ||
.withNewPersistentClaimStorage() | ||
.withSize("1Gi") | ||
.withStorageClass(storageClassName) | ||
.endPersistentClaimStorage() | ||
.endSpec() | ||
.build(), | ||
KafkaNodePoolTemplates.controllerPoolPersistentStorage(testStorage.getNamespaceName(), testStorage.getControllerPoolName(), testStorage.getClusterName(), 3) | ||
.editSpec() | ||
.withNewPersistentClaimStorage() | ||
.withSize("1Gi") | ||
.withStorageClass(storageClassName) | ||
.endPersistentClaimStorage() | ||
.endSpec() | ||
.build() | ||
) | ||
); | ||
resourceManager.createResourceWithWait(KafkaTemplates.kafkaPersistent(testStorage.getNamespaceName(), testStorage.getClusterName(), 3, 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test testTopicNotAvailable
actually fails with kraft on a step of recreating Kafka Cluster after namespace is deleted. All to be recreated pods get into crash-loop as they try to connect to incorrect cluster.
For example:
Exception in thread “main” java.lang.RuntimeException: Invalid cluster.id in: /var/lib/kafka/data/kafka-log1/meta.properties. Expected 28c5tB6tSlCjEHep6l3Jww, but read TRrzEInZRgCs_R9iNz2Gkw at org.apache.kafka.metadata.properties.MetaPropertiesEnsemble.verify(MetaPropertiesEnsemble.java:509)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@see-quick finally I was able to have a look at this.
- deploy the Kafka cluster without Topic Operator - otherwise topics will be deleted
This is not needed anymore with the new Topic Operator, so it can be removed from the code.
To make this test work on my local Minikube I had to update the StorageClass.VolumeBindingMode
from WaitForFirstConsumer
to Immediate
. Is there any Minikube configuration you are doing here? Is this documented somewhere?
I was able to reproduce the issue mentioned by @henryZrncik:
Exception in thread "main" java.lang.RuntimeException: Invalid cluster.id in: /var/lib/kafka/data/kafka-log0/meta.properties. Expected kQHv733NQIew9aw9uCXnDA, but read 2DLef4_8TqOVdBjzCORB1Q
I also think this is a bug, so I raised #10722. The attached fix should work with this test. Hope it helps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a bug, it means that this was not set when recovering the Kafka CR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@im-konge we now have an updated procedure that may help with this test: https://strimzi.io/docs/operators/in-development/deploying#proc-cluster-recovery-volume-str
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will update the test when I find some time thanks 💯
Thanks for investigating this problem. So let's do the following:
And in the case of So in summary I think making those tests |
6d2db69
to
d9b2a03
Compare
systemtest/src/test/java/io/strimzi/systemtest/operators/NamespaceDeletionRecoveryST.java
Outdated
Show resolved
Hide resolved
systemtest/src/test/java/io/strimzi/systemtest/operators/NamespaceDeletionRecoveryST.java
Show resolved
Hide resolved
Signed-off-by: see-quick <[email protected]>
Signed-off-by: see-quick <[email protected]>
Signed-off-by: see-quick <[email protected]>
Signed-off-by: see-quick <[email protected]>
d1aa08d
to
69fc552
Compare
I have updated the tests to match the recovery procedure. The tests are passing now with such a change. |
@strimzi-ci run tests --cluster-type=ocp --cluster-version=4.17 --install-type=bundle --profile=recovery |
|
Signed-off-by: see-quick <[email protected]>
@strimzi-ci run tests --cluster-type=ocp --install-type=bundle --profile=recovery |
|
✔️ Test Summary ✔️TEST_PROFILE: recovery |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking on this. Just one change is needed i guess ?
systemtest/src/test/java/io/strimzi/systemtest/operators/RecoveryST.java
Show resolved
Hide resolved
Signed-off-by: see-quick <[email protected]>
Type of change
Description
This PR changes our recovery tests so they can run with KRaft mode.
Checklist