-
Notifications
You must be signed in to change notification settings - Fork 471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CORS-3687: Enhancement proposal for setting EIPs for Ingress Controller via installer #1688
base: master
Are you sure you want to change the base?
Conversation
@miheer: This pull request references CORS-3687 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.18.0" version, but no target version was set. In response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
c79695a
to
c4312f0
Compare
c4312f0
to
454b5a8
Compare
@mtulio @patrickdillon @r4f4 PTAL |
@miheer: This pull request references CORS-3687 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.18.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
platform: | ||
aws: | ||
region: <AWS region> | ||
lbType: NLB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the installer have to validate the lbType is NLB when eipAllocations are specified? What happens when lbType: Classic
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we can check this. I will add a CEL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to this question, I don't see this in the EP yet.
Also, the installer doesn't use CEL, but I think it's fine to provide it in this proposal as validation guidelines.
aws: | ||
region: <AWS region> | ||
lbType: NLB | ||
networkLoadBalancer: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
networkLoadBalancer: | |
ingressNetworkLoadBalancer: |
So it's not confused with the one created by the Installer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I will make that change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes made
``` | ||
|
||
In the `IngressController` status, check the status for the following: | ||
- Error messages for invalid eips or eips not present in the subnet of the VPC is `The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer: failed to ensure load balancer: error creating load balancer: "AllocationIdNotFound:` for status type `LoadBalancerReady` and `Available`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When this error occurs, how does the customer/user remedy the situation? Restart installation with correct EIPs or can the user edit the ingressNetworkLoadBalancer
list within the Ingress Config spec so that CCM can successfully reconcile the creation of the Ingress LB or both? Any preference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sadasu can you please check point 3 under section ### Implementation Details/Notes/Constraints
Please let me know WYT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also if an error is invoked we can either stop or ask for the correct input from user. I need to check the installer code if we can add a loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sadasu can you please check point 3 under section ### Implementation Details/Notes/Constraints
Please let me know WYT.
@miheer OK with that section.
Probably out-of-scope for this enhancement, when Installer determines the predicted LB subnet count, would it useful for the Installer to pass the predicted LB subnets to the AWS CCM via a manifest? That way we will not be duplicating this logic of figuring out the correct subnets to use. /cc @patrickdillon @mtulio
9f2d0ae
to
c39762e
Compare
(i.e. any subnet without another cluster's kubernetes.io/cluster/<cluster-id> tag). | ||
We can call this Predicted LB Subnet Count. | ||
|
||
We can examine the following scenarios: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can Predicted LB Subnet count
< BYO Subnet count
? If not, these are the possible scenarios:
EIP Allocations count < BYO Subnet count
: errorEIP Allocations count < Predicted LB Subnet count
: error
Is is an issue if extra EIP Allocations
are supplied?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the count of eip allocation should exactly match the number of subnets. It can't be less or greater than and must be equal to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still working on reviewing. Only a couple of comments for now, will come back later.
creation-date: 2024-05-29 | ||
last-updated: 2024-09-04 | ||
tracking-link: | ||
- https://issues.redhat.com/browse/CORS-3440 | ||
see-also: | ||
- "enhancements/ingress/lb-subnet-selection-aws.md" | ||
replaces: | ||
- "enhancements/installer/aws-customer-provided-subnets.md" | ||
superseded-by: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you forgot to update some of the fields after a copy/paste
creation-date: 2024-05-29 | |
last-updated: 2024-09-04 | |
tracking-link: | |
- https://issues.redhat.com/browse/CORS-3440 | |
see-also: | |
- "enhancements/ingress/lb-subnet-selection-aws.md" | |
replaces: | |
- "enhancements/installer/aws-customer-provided-subnets.md" | |
superseded-by: | |
creation-date: 2024-? | |
last-updated: 2024-? | |
tracking-link: | |
- https://issues.redhat.com/browse/CORS-3687 | |
see-also: | |
replaces: | |
superseded-by: |
|
||
This enhancement extends the OpenShift Installer's install-config, enabling cluster admins to | ||
configure EIPs for AWS NLB load balancer created for their default NLB IngressController at install time. | ||
This proposal allows the install-time configuration of subnets for the `default` IngressController. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This proposal allows the install-time configuration of subnets for the `default` IngressController. |
## Motivation | ||
|
||
### User Stories | ||
- As a cluster administrator using installer, I want to configure default NLB IngressController to use EIPs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you just defined the default
IngressController above, but then don't use the backticks. Just want to make sure indeed talking about the default
IngressController.
- As a cluster administrator using installer, I want to configure default NLB IngressController to use EIPs. | |
- As a cluster administrator using installer, I want to configure `default` NLB IngressController to use EIPs. |
Here's one recommended option: | ||
|
||
The Installer should count all LB subnets by predicting what subnets be chosen by the AWS CCM | ||
(i.e. any subnet without another cluster's kubernetes.io/cluster/<cluster-id> tag). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not only subnets without another cluster's kubernetes.io/cluster/<cluster-id>
tag, but the subnet won't be selected if the load balancer is external and the subnet is private.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
which uses the value from the field `eipAllocations` of `IngressController` CR. | ||
|
||
2. #### Validation on installer when installing in managed VPC (full-automated) based in the discovered zones used to create the cluster. | ||
We will be comparing the number of Availability Zones in the region to the number of eipAllocations passed in the `install-config.yaml`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably worth an explanation why: because the cluster will select 1 subnet per AZ, and the number of EIPs must be equal to subnets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
|
||
We can examine the following scenarios: | ||
|
||
##### BYO Subnet Count != EIPs Allocations && BYO Subnet Count == Predicted LB Subnet count && Predicted LB Subnet count != EIPs Allocations: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might be able to eliminate all of this complex validation if:
- We agree to simplify the new subnets API (CORS-3440: IngressController subnet selection at installation #1634 (comment))
- EIP Allocations waits for new subnet API to be available and is dependent on it
But it's a tough call. Maybe we have to do this validation in the beginning so you can release the EIP Allocations feature, but when the new subnets API comes out, you can get rid of all of this complexity. I think that would be a massive win as far as a maintenance burden.
I would keep this validation here for now, but I will let you know if there are any updates.
/assign |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, good information in here. I think validation is the biggest item of contention that we need to solve.
### Non-Goals | ||
- Creation of EIPs in AWS. | ||
- Static IP usage with NLBs for OpenShift API server, DNS, Nat Gateways, LBs, Instances. | ||
- To assign IPs from a Customer Owned IP (CoIP) Pool when using Outposts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's pretty obvious, but maybe worth a mention that this default is not for user-created IngressControllers.
- To assign IPs from a Customer Owned IP (CoIP) Pool when using Outposts. | |
- To assign IPs from a Customer Owned IP (CoIP) Pool when using Outposts. | |
- Set default EIPs for user-created IngressControllers |
- To assign IPs from a Customer Owned IP (CoIP) Pool when using Outposts. | ||
|
||
## Proposal | ||
This enhancement adds API fields in the installer and the IngressController specification |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this proposal is adding any APIs to the IngressController right?
This enhancement adds API fields in the installer and the IngressController specification | |
This enhancement adds API fields in the installer and the Ingress Config specification |
### API Extensions | ||
|
||
#### Installer Updates | ||
- The first API extension for setting `eipAllocations` is in the installer [Platform](https://github.com/openshift/installer/blob/master/pkg/types/aws/platform.go) type, where the new field `NetworkLoadBalancerParameters` is added as an optional field. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see this field name.
- The first API extension for setting `eipAllocations` is in the installer [Platform](https://github.com/openshift/installer/blob/master/pkg/types/aws/platform.go) type, where the new field `NetworkLoadBalancerParameters` is added as an optional field. | |
- The first API extension for setting `eipAllocations` is in the installer [Platform](https://github.com/openshift/installer/blob/master/pkg/types/aws/platform.go) type, where the new field `eipAllocations` is added as an optional field. |
// eipAllocations holds eipAllocations for an default AWS | ||
// NLB IngressController. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You created this as an struct of EIPAllocations, good idea, so it may be extended in the future if anything else needs EIPs. But shouldn't this go doc reflect that it's a generic structure? Does this reflect your idea of the API?:
// eipAllocations holds eipAllocations for an default AWS | |
// NLB IngressController. | |
// eipAllocations contains Elastic IP (EIP) allocations for AWS resources | |
// within the cluster. |
// EIPAllocations holds configuration parameters for an | ||
// default AWS NLB IngressController. For Example: Setting AWS EIPs https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/elastic-ip-addresses-eip.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also here. I think the link is unnecessary, but keep it if you like it.
// EIPAllocations holds configuration parameters for an | |
// default AWS NLB IngressController. For Example: Setting AWS EIPs https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/elastic-ip-addresses-eip.html | |
// EIPAllocations contains Elastic IP (EIP) allocations for AWS resources | |
// within the cluster. |
|
||
## Open Questions | ||
- Q: As per [EP](https://github.com/openshift/enhancements/pull/1634), old subnets field will be deprecated. So, shall we skip the validation for checking | ||
number of `BYO Subnets` provided in the `install-config.yaml` with the number of eipAllocations ? Or shall we compare the old subnets field with the eipAllocations ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm 50/50 on doing validation. It's going to be painful to write and maintain, but also it sucks when I install a cluster and realize 45 minutes later that I didn't add enough EIPs. Instant feedback is much better, as long as we get it right.
Like I said in another comment, I'm pushing for a simplification in the new subnets field, where this whole "predicited LB subnet count" goes away. IF that's introduced, validation will be trivial len(subnets) == len(eips)
. I yield to the installer team on this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
## Open Questions | ||
- Q: As per [EP](https://github.com/openshift/enhancements/pull/1634), old subnets field will be deprecated. So, shall we skip the validation for checking | ||
number of `BYO Subnets` provided in the `install-config.yaml` with the number of eipAllocations ? Or shall we compare the old subnets field with the eipAllocations ? | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mind adding one more that I'm curious about? CC: @JoelSpeed
- Q: Should we split Ingress Config into defaulting for the default IngressController and defaulting for user-created IngressControllers? |
the Predicted LB Subnets != BYO Subnet Count scenario as not valid? And possibly block future installs as a resolution | ||
to https://issues.redhat.com/browse/OCPBUGS-17432? That would make EIP Allocation a lot easier, but not sure if that's realistic. | ||
|
||
4. #### Validation to check if EIPs are not already assigned to resources. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about if the EIP exists? I don't think you explicitly mention that in these sections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you mean if the eip does not exists because we need the eips to be present but unassociated right ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i made this change
4. #### Validation to check if EIPs exist and are not already assigned to resources.
EIPs can be assigned to many resource types, like Nat Gateways, *LBs, Instances, etc. The attribute associationId will be set when the EIP is already associated. | ||
To mitigate this we could add that validation, at least on installer, to provide quick-feedback (fail when validate install-config) to the user when the provided EIP is already associated to another resource. | ||
It would be nice to have a validation before setting the annotation to CCM, keeping the operator degraded before disrupting the service. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add a small section on the defaulting mechanics for the IngressController? It's definitely an implementation detail, but an important one that feels important enough to mention here. Something @Miciah or @JoelSpeed might be interested in commenting on. I wonder if doing status "right" was a bit of a mistake, because now things are becoming inconsistent in the API. Either way, that that ship sailed in 4.17.
#### IngressConfig EIP Allocation Defaulting Mechanics for Ingress Controller | |
Traditionally, the Ingress Operator has populated default values from the Ingress Config into the `status`, making `status` effectively reflect the desired state of the IngressController. However, since `eipAllocations` in `status` represents the **actual** state, not the **desired** state, the default `eipAllocations` values must be set in the `spec` when the Ingress Operator initially admits the IngressController. | |
This approach is new. The Ingress Operator does not typically set default values in `spec` for load balancer configurations if the user hasn’t explicitly provided them. While this defaulting pattern is more consistent with Kubernetes conventions for `spec` and `status` (and is also our only option in this situation), it's important to acknowledge that this inconsistency in defaulting behavior could cause confusion for users. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
### Goals | ||
- Users are able to use EIPs for a default NLB `IngressController` at install time. | ||
- Check for unassociated EIPs before passing to CCM. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think we should make this more generic? Otherwise, you are missing a couple of validations here.
- Check for unassociated EIPs before passing to CCM. | |
- Add validation to the installer to prevent invalid EIP configurations |
// +listType=atomic | ||
// +kubebuilder:validation:XValidation:rule=`self.all(x, self.exists_one(y, x == y))`,message="eipAllocations cannot contain duplicates" | ||
// +kubebuilder:validation:MaxItems=10 | ||
IngressNetworkLoadBalancer []EIPAllocation `json:"ingressNetworkLoadBalancer"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the existing PublicIpv4Pool
field be used instead to satisfy the requirements?
I think the answer is no, IIUC users want to setup firewall rules in advance, so they need to specify explicit IP addresses. A pool won't suffice. It may be worth drawing a distinctoin between your proposal and this existing field. Would be interested in @mtulio's take on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mtulio PTAL .
I think we need some feedback from installer-team, @patrickdillon or @mtulio or @sadasu on this type of validation. Should the installer team consider | ||
the Predicted LB Subnets != BYO Subnet Count scenario as not valid? And possibly block future installs as a resolution | ||
to https://issues.redhat.com/browse/OCPBUGS-17432? That would make EIP Allocation a lot easier, but not sure if that's realistic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OCPBUGS-17432 is closed, but I believe the problem is not actually resolved until RFE-1717 is implemented, correct?
Are you proposing enforcing this validation only when EIP allocations are specified? I'm certainly open to that. If we do it in all cases (i.e. when EIP allocations are not specified) that might be more tricky, as the install config that "worked" before, now starts failing...
But in the case of EIP Allocations, this is new functionality so we won't break any existing workflows. If we did throw a validation error when EIP Allocation Count != Predicted LB Subnets, then users would be able to resolve the issue on their own, right? Say by adding the unmanaged tag to other subnets in the VPC...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gcs278 ^^ WDYT ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OCPBUGS-17432 is closed, but I believe the problem is not actually resolved until RFE-1717 is implemented, correct?
Correct. It was closed as "not a bug", but I do think it could still be considered a bug, or at least, it's confusing UX that has lead to other RFEs like https://issues.redhat.com/browse/RFE-2816, and it was the reason why Service Delivery opened the subnet RFE https://issues.redhat.com/browse/RFE-1717 in the first place. Everything comes back to OCPBUGS-17432...
Are you proposing enforcing this validation only when EIP allocations are specified?
Right. Doing it in all cases (with no-EIPS), would be something I could implement in the subnets EP #1634.
If we do it in all cases (i.e. when EIP allocations are not specified) that might be more tricky, as the install config that "worked" before, now starts failing...
Yes, in the context of this EP, but I don't think that applies if we deprecate and add a new subnetConfig
field like the suggestion in #1634. Behavior can change since customers have to explicitly opt into the new field. I've recently realized this and suggested as a improvement in UX with making this new field. My suggestion, rather than error out when the subnet counts aren't equal, is to just make Cluster Subnets == IngressController subnets (and completely bypass the AWS CCM subnet discovery logic). But this is a discussion for that EP, so feel free to jump in there.
If we did throw a validation error when EIP Allocation Count != Predicted LB Subnets, then users would be able to resolve the issue on their own, right? Say by adding the unmanaged tag to other subnets in the VPC...
Right.
@miheer: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Enhancement proposal for setting EIPs for Ingress Controller via installer