CF API Server becomes unavailable during updates #636

braunsonm · 2021-03-09T17:28:17Z

Describe the bug

In a production deployment downtime of the API Server during updates is not in line with CF-for-VMs. The default should deploy more than 1 replica and do a rolling update.

Current behavior

The API Server will be taken offline to update the image.

Expected behavior

More than 1 replica to remain online during CF Updates.

Additional context

cf-for-k8s SHA

v2.1.1

cf-gitbot · 2021-03-09T17:28:20Z

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/177271427

The labels on this github issue will be updated when the story is started.

matt-royal · 2021-03-15T23:15:01Z

Thank you for the issue, @braunsonm. We just committed a change to the develop branch that allows you to scale up the cf-api-server via a data value (capi.cf_api_server.replicas). Once this makes it into a release, you can easily scale up to 2+ replicas and avoid this problem.

braunsonm · 2021-03-15T23:35:27Z

@matt-royal the point of this issue was I believe this should be the default. This is a 5 cluster deployment and it is expected it should be highly available without a bunch of tweaks.

If not I'd recommend a document in the repo that tells users what steps they need to make to make it HA (external DB, external blobstore, recommended replica counts).

Birdrock · 2021-03-23T22:31:18Z

@braunsonm I'm re-opening this for more discussion.

We've found some configuration that may alleviate the problem, but the larger discussion is around what our default deploy target is. To the present, we've been targeting small clusters or developer workstations. A truly HA configuration isn't a very good out of the box kick-the-tires solution, so we may need to make some compromise.

To that end, the result of this issue may be to open a new issue with some clarified requirements.

cf-gitbot · 2021-03-23T22:31:21Z

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/177468264

The labels on this github issue will be updated when the story is started.

braunsonm · 2021-03-23T22:55:15Z

I thought the deployment was targeted close to something HA with the exception of the DB and Blobstore.

If the goal is to be similar to cf-deployment on Bosh then the default deployment should be HA with batteries included. With the remove_resource_requirements for developer machines. That's the way we personally have been treating it.

When the deployment requirements are a 5 node cluster that seems to be quite a stretch if you are defaulting your target to a developer workstation. As you said, even some clarified documentation for operators running this in production would be good to have 👍 If you need any help with that based on our experience don't hesitate to reach out.

cf-gitbot added unscheduled scheduled in progress and removed unscheduled scheduled labels Mar 9, 2021

matt-royal closed this as completed Mar 15, 2021

cf-gitbot added delivered and removed in progress labels Mar 15, 2021

Birdrock reopened this Mar 23, 2021

cf-gitbot added unscheduled and removed delivered labels Mar 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CF API Server becomes unavailable during updates #636

CF API Server becomes unavailable during updates #636

braunsonm commented Mar 9, 2021

cf-gitbot commented Mar 9, 2021

matt-royal commented Mar 15, 2021

braunsonm commented Mar 15, 2021

Birdrock commented Mar 23, 2021

cf-gitbot commented Mar 23, 2021

braunsonm commented Mar 23, 2021

CF API Server becomes unavailable during updates #636

CF API Server becomes unavailable during updates #636

Comments

braunsonm commented Mar 9, 2021

Describe the bug

Current behavior

Expected behavior

Additional context

cf-for-k8s SHA

cf-gitbot commented Mar 9, 2021

matt-royal commented Mar 15, 2021

braunsonm commented Mar 15, 2021

Birdrock commented Mar 23, 2021

cf-gitbot commented Mar 23, 2021

braunsonm commented Mar 23, 2021