Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-32776: Fix IBM Public Cloud DNS Provider Update Logic #1133

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

gcs278
Copy link
Contributor

@gcs278 gcs278 commented Aug 22, 2024

The IBM Public Cloud DNS provider (cis_provider.go) had a bug in createOrUpdateDNSRecord where it checked for the existence of a DNS record by filtering both DNS name and target. If the target was updated (e.g., due to a load balancer recreation), the logic would not match the existing DNS record. As a result, the function would attempt to create a new record, but fail because a record with that name already existed, as multiple DNS records with the same name are not allowed.

The fix is to remove the filtering by target and rely solely on filtering by name, as the name is the only attribute that needs to be unique.

Additionally, the IBM DNS logic doesn't work for multiple targets and this creates unexpected and problematic results. The logic has been refactored to only create and delete using the first target. It warns the user when multiple targets are set.

This PR also includes some unit test fix up and missing unit test coverage for the IBM CIS Provider.

This resolves the same DNS issues for public PowerVS cloud as it uses the same logic.

@openshift-ci-robot openshift-ci-robot added jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. labels Aug 22, 2024
@openshift-ci-robot
Copy link
Contributor

@gcs278: This pull request references Jira Issue OCPBUGS-32776, which is invalid:

  • expected the bug to target the "4.18.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

The IBM Public Cloud DNS provider (cis_provider.go) had a bug in createOrUpdateDNSRecord where it checked for the existence of a DNS record by filtering both DNS name and target. If the target was updated (e.g., due to a load balancer recreation), the logic would not match the existing DNS record. As a result, the function would attempt to create a new record, but fail because a record with that name already existed, as multiple DNS records with the same name are not allowed.

The fix is to remove the filtering by target and rely solely on filtering by name, as the name is the only attribute that needs to be unique.

This PR also includes some unit test fix up and missing coverage for the IBM CIS Provider.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Aug 22, 2024
@gcs278 gcs278 force-pushed the ibm-dns-createOrUpdateDNSRecord-bug branch from b387d07 to b07d602 Compare August 22, 2024 22:50
@gcs278
Copy link
Contributor Author

gcs278 commented Aug 22, 2024

/jira refresh

@openshift-ci-robot openshift-ci-robot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label Aug 22, 2024
@openshift-ci-robot
Copy link
Contributor

@gcs278: This pull request references Jira Issue OCPBUGS-32776, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.18.0) matches configured target version for branch (4.18.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @lihongan

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot removed the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Aug 22, 2024
@openshift-ci openshift-ci bot requested a review from lihongan August 22, 2024 22:57
@openshift-ci-robot
Copy link
Contributor

@gcs278: This pull request references Jira Issue OCPBUGS-32776, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.18.0) matches configured target version for branch (4.18.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @lihongan

In response to this:

The IBM Public Cloud DNS provider (cis_provider.go) had a bug in createOrUpdateDNSRecord where it checked for the existence of a DNS record by filtering both DNS name and target. If the target was updated (e.g., due to a load balancer recreation), the logic would not match the existing DNS record. As a result, the function would attempt to create a new record, but fail because a record with that name already existed, as multiple DNS records with the same name are not allowed.

The fix is to remove the filtering by target and rely solely on filtering by name, as the name is the only attribute that needs to be unique.

This PR also includes some unit test fix up and missing coverage for the IBM CIS Provider.

This also resolves the same DNS issues for public PowerVS cloud as it uses the same logic.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@gcs278
Copy link
Contributor Author

gcs278 commented Aug 26, 2024

I missed one detail of the bug, missing instructions in the Progressing status condition for PowerVS when you change scope. Looking into it.
/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 26, 2024
@gcs278 gcs278 force-pushed the ibm-dns-createOrUpdateDNSRecord-bug branch from a057f1a to 0716651 Compare August 26, 2024 19:28
@gcs278
Copy link
Contributor Author

gcs278 commented Aug 26, 2024

@lihongan Added the missing scope change instructions for the PowerVS type.

/unhold

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 26, 2024
@gcs278
Copy link
Contributor Author

gcs278 commented Aug 27, 2024

infra failures
/retest

@gcs278 gcs278 force-pushed the ibm-dns-createOrUpdateDNSRecord-bug branch from 0716651 to f51b1d3 Compare August 27, 2024 14:31
@gcs278
Copy link
Contributor Author

gcs278 commented Aug 28, 2024

Infra issues
/retest

@lihongan
Copy link
Contributor

pre-merge tested on OpenStack and looks good now

$ oc -n openshift-ingress-operator patch ingresscontroller/intlb --type=merge --patch='{"spec":{"endpointPublishingStrategy": {"type":"LoadBalancerService", "loadBalancer": {"scope":"External"}}}}'
ingresscontroller.operator.openshift.io/intlb patched

$ oc get co/ingress
NAME      VERSION                                                   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
ingress   4.17.0-0.ci.test-2024-08-28-040831-ci-ln-13rxbgt-latest   True        True          False      23m     ingresscontroller "intlb" is progressing: IngressControllerProgressing: One or more status conditions indicate progressing: LoadBalancerProgressing=True (OperandsProgressing: One or more managed resources are progressing: The IngressController scope was changed from "Internal" to "External".  To effectuate this change, you must delete the service: `oc -n openshift-ingress delete svc/router-intlb`; the service load-balancer will then be deprovisioned and a new one created.  This will most likely cause the new load-balancer to have a different host name and IP address from the old one's.  Alternatively, you can revert the change to the IngressController: `oc -n openshift-ingress-operator patch ingresscontrollers/intlb --type=merge --patch='{"spec":{"endpointPublishingStrategy":{"loadBalancer":{"scope":"Internal"}}}}'`).

$ oc -n openshift-ingress delete svc/router-intlb
service "router-intlb" deleted

$ oc -n openshift-ingress get svc/router-intlb
NAME           TYPE           CLUSTER-IP      EXTERNAL-IP    PORT(S)                      AGE
router-intlb   LoadBalancer   172.30.110.89   38.102.83.53   80:32114/TCP,443:30834/TCP   4m23s

will test on IBMCloud later

@lihongan
Copy link
Contributor

pre-merge tested on IBMCloud and also looks good

$ oc -n openshift-ingress get svc/router-intlb
NAME           TYPE           CLUSTER-IP      EXTERNAL-IP                         PORT(S)                      AGE
router-intlb   LoadBalancer   172.30.111.21   53084b0d-eu-de.lb.appdomain.cloud   80:31026/TCP,443:32398/TCP   78s

$ oc -n openshift-ingress-operator patch ingresscontroller/intlb --type=merge --patch='{"spec":{"endpointPublishingStrategy": {"type":"LoadBalancerService", "loadBalancer": {"scope":"External"}}}}'
ingresscontroller.operator.openshift.io/intlb patched

$ oc -n openshift-ingress delete svc/router-intlb
service "router-intlb" deleted

$ oc -n openshift-ingress get svc/router-intlb
NAME           TYPE           CLUSTER-IP      EXTERNAL-IP                         PORT(S)                      AGE
router-intlb   LoadBalancer   172.30.242.68   72736508-eu-de.lb.appdomain.cloud   80:32041/TCP,443:30579/TCP   65s

$ oc -n openshift-ingress-operator get ingresscontroller/intlb -oyaml
<......>
  - lastTransitionTime: "2024-08-28T09:24:02Z"
    message: The record is provisioned in all reported zones.
    reason: NoFailedZones
    status: "True"
    type: DNSReady

@Miciah
Copy link
Contributor

Miciah commented Aug 28, 2024

/assign

@Miciah
Copy link
Contributor

Miciah commented Aug 28, 2024

@SzucsAti, this is a follow-up to #796. Are you available to review the changes to the IBM DNS provider?

@gcs278 gcs278 changed the title OCPBUGS-32776: Fix IBM Public Cloud DNS Provider Update Logic OCPBUGS-32776: Fix IBM Public Cloud DNS Provider Update Logic and Add Missing Instructions to the Progressing Condition Aug 28, 2024
@gcs278 gcs278 force-pushed the ibm-dns-createOrUpdateDNSRecord-bug branch from f51b1d3 to 7a97b9a Compare August 28, 2024 15:29
Copy link
Contributor

@SzucsAti SzucsAti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thank you for the fix!

@gcs278 gcs278 changed the title OCPBUGS-32776: Fix IBM Public Cloud DNS Provider Update Logic [WIP] OCPBUGS-32776: Fix IBM Public Cloud DNS Provider Update Logic Sep 12, 2024
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 12, 2024
Previously, some `RecordedCall` results were using the stored results
from the previous test case because the DNS Names were identical. This
led to incorrectly encoded `RecordedCall` test case parameters, causing
some test cases to fail when run individually. To fix this, the call
history is cleared before every test case runs.
@gcs278 gcs278 force-pushed the ibm-dns-createOrUpdateDNSRecord-bug branch 13 times, most recently from 6ee5229 to 437655a Compare September 13, 2024 22:17
@gcs278
Copy link
Contributor Author

gcs278 commented Sep 16, 2024

/retest

@gcs278 gcs278 force-pushed the ibm-dns-createOrUpdateDNSRecord-bug branch 2 times, most recently from 4758a5b to ec2fcdf Compare September 16, 2024 20:46
Both IBM public and private cloud unit tests were missing unit test
coverage. This update extends test coverage for the Delete and
CreateOrUpdateRecord functions. This commit provides an important point
of reference for future commits that may preturb the existing
functionality.

Both Test_createOrUpdateDNSRecord functions previously only tested the
update logic. In order to test the create logic, `CreateDNSRecord` and
`CreateResourceRecord` needed to be implemented in the public and
private `fake_client.go` respectively.

The new test cases required the ability to control the response and results
of `ListAllDnsRecords`, which were previously hardcoded in both public
and private IBM cloud unit tests. Both public and private unit tests
were updated to use the new OutputResults field specified in
the `ListAllDnsRecordsInputOutput` struct, allowing the new test
cases to specify no result (indicating no existing DNS record) so we
can trigger the create logic.

Lastly, various test cases were added to cover untested scenarios, such
as testing CNAME Records, mismatching targets, missing record types or
IDs, handling nil results, etc.
@gcs278 gcs278 force-pushed the ibm-dns-createOrUpdateDNSRecord-bug branch from ec2fcdf to 32caad5 Compare September 16, 2024 20:50
The IBM Public Cloud DNS provider (`cis_provider.go`) had a bug in
`createOrUpdateDNSRecord` where it checked for the existence of a
DNS record by filtering both DNS name and target. If the target
was updated (e.g., due to a load balancer recreation), the logic
would not match the existing DNS record. As a result, the function
would attempt to create a new record, but fail because a record with
that name already existed, as multiple DNS records with the same
name are not allowed in IBM Cloud DNS providers.

The fix is to remove the filtering by target and rely solely on
filtering by name, as the name is the only attribute that needs
to be unique.

Additionally, the IBM DNS logic doesn't work for multiple targets and
this creates unexpected and problematic results. The logic has been
refactored to only create using the first target and it warns the user
when multiple targets are set. This change is low risk since the Ingress
Operator will never create a DNSRecord with multiple targets in
`desiredDNSRecord`.
@gcs278 gcs278 force-pushed the ibm-dns-createOrUpdateDNSRecord-bug branch from 32caad5 to 2223383 Compare September 16, 2024 20:55
@gcs278
Copy link
Contributor Author

gcs278 commented Sep 16, 2024

infra failures
/retest

@gcs278 gcs278 changed the title [WIP] OCPBUGS-32776: Fix IBM Public Cloud DNS Provider Update Logic OCPBUGS-32776: Fix IBM Public Cloud DNS Provider Update Logic Sep 16, 2024
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 16, 2024
@gcs278
Copy link
Contributor Author

gcs278 commented Sep 16, 2024

Removed WIP, sorry for the delay, I am open to reviews again @candita @Miciah @SzucsAti

Updates since last review:

  • I added more unit test coverage.
    • I felt it necessary to fix existing bugs in the unit tests and add more coverage as it provides confidence that I didn't perturb the DNS provider logic in an unintentional way.
    • The unit tests being in separate commits is intentional and provides a valuable diff before & after the actual bug fix here. You can review the last commit and see there were a few required unit test updates which are results of the bug fixes I added.
  • I removed create/update support for multiple targets in DNSRecords for IBM Cloud.
    • First, the code was misleading because IBM DNS doesn't support it, and therefore it doesn't work (it hot loops and overwrites records in IBM private cloud).
    • Second, since the bug fix actually impacts the way that multi-targeted DNSRecords would get created in IBM Public Cloud, it seemed logical to fix this right now, otherwise, we'd create a future headache for maintaining compatibility (if we were ever concerned about compatibility here)

@gcs278
Copy link
Contributor Author

gcs278 commented Sep 16, 2024

Keeping the hold while I wait to see what the results of adding a IBM Cloud job are openshift/release#56785

/hold

@gcs278
Copy link
Contributor Author

gcs278 commented Sep 17, 2024

/retest

Copy link
Contributor

openshift-ci bot commented Sep 17, 2024

@gcs278: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. qe-approved Signifies that QE has signed off on this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants