Release 0.58.0 · d2iq-archive/dcos-commons

Automatic Pod-Replacement

This release of the SDK introduces automatic-pod replacement. It is suggested to use an automatic pod-replacement failure policy with external-storage as rebuilding data associated with a pod isn't expected to be as expensive as compared to equivalent ROOT or MOUNT volumes.

Note: When enabled, automatic pod replacement applies across the service and not just a single pod type.

Replacement Failure Policy

Services implementing automatic pod replacement must specify the Replacement Failure Policy for tasks declared as permanently failing. The Replacement Failure Policy is constructed and passed to a Service Spec in a programmatic way as part of the scheduler's implementation.

Cassandra example:

Replacement Failure Policy cannot be defined in a YAML service specification.

Vertical bursting support

Previously, sidecar tasks (such as running Cassandra nodetool repair) were able to consume memory and cpu resources from the primary task. This is because Mesos previously launched all tasks in to a single cgroups. SDK v0.58.0 will instruct Mesos to launch all tasks in separate cgroups. This means that if a sidecar task actually needs more memory than that which it specifically requests, and no configuration is changed, the sidecar task will get OOM killed.

To remedy this, all frameworks should update their service specs so that ultimately the resource-limits for both the primary data service task, and side-car tasks, can be defined. Further, the templates in the Universe should expose appropriate configuration parameters so that bursting for both can be defined.

Tasks can be configured to optional consume more than they request.

With resource-limits, a task can be configured to consume more CPU or Memory than that which is requested and reserved. This is fantastic news for data-services that would permanently set aside an entire CPU so that occassional backup or repair side-car tasks can be run. To repeat a point made before, SDK service templates should expose appropriate configuration parameters so that resource-limits can be set, at the very least, for sidecar tasks.

Instead of Cassandra requiring 1 CPU for a sidecar tasks, it could instead set aside 0.1 CPU, and then set a resource limit of up to 2 CPUs. This will allow the task to run fast when their are leftover resources to run them, and leave CPUs available to service time-sensitive API requests.
This feature is opt-in, enable it by adding resource limits to your Service Spec.
PR (#3231)

Support external volumes

Cassandra example:

Configuring external volume

This feature is opt-in, enable it by adding external volumes to your Service Spec.
Design document (#3266)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0.58.0

Automatic Pod-Replacement

Replacement Failure Policy

Vertical bursting support

Tasks can be configured to optional consume more than they request.

Support external volumes