Preview: Kubernetes(since 1.23) and Containerd(since 1.6.0-beta4) will help calculate
Sandbox Size
info and pass it to Kata Containers through annotations. In order to adapt to this beneficial change and be compatible with the past, we have implemented the new vCPUs handling way inruntime-rs
, which is slightly different from the originalruntime-go
's design.
vCPUs sizing should be determined by the container workloads. So throughout the life cycle of Kata Containers, there are several points in time when we need to think about how many vCPUs should be at the time. Mainly including the time points of CreateVM
, CreateContainer
, UpdateContainer
, and DeleteContainer
.
CreateVM
: When creating a sandbox, we need to know how many vCPUs to start the VM with.CreateContainer
: When creating a new container in the VM, we may need to hot-plug the vCPUs according to the requirements in container's spec.UpdateContainer
: When receiving theUpdateContainer
request, we may need to update the vCPU resources according to the new requirements of the container.DeleteContainer
: When a container is removed from the VM, we may need to hot-unplug the vCPUs to reclaim the vCPU resources introduced by the container.
When Kata calculate the number of vCPUs, We have three data sources, the default_vcpus
and default_maxvcpus
specified in the configuration file (named TomlConfig
later in the doc), the io.kubernetes.cri.sandbox-cpu-quota
and io.kubernetes.cri.sandbox-cpu-period
annotations passed by the upper layer runtime, and the corresponding CPU resource part in the container's spec for the container when CreateContainer
/UpdateContainer
/DeleteContainer
is requested.
Our understanding and priority of these resources are as follows, which will affect how we calculate the number of vCPUs later.
- From
TomlConfig
:default_vcpus
: default number of vCPUs when starting a VM.default_maxvcpus
: maximum number of vCPUs.
- From
Annotation
:InitialSize
: we call the size of the resource passed from the annotations asInitialSize
. Kubernetes will calculate the sandbox size according to the Pod's statement, which is theInitialSize
here. This size should be the size we want to prioritize.
- From
Container Spec
:- The amount of CPU resources that the Container wants to use will be declared through the spec. Including the aforementioned annotations, we mainly consider
cpu quota
andcpuset
when calculating the number of vCPUs. cpu quota
:cpu quota
is the most common way to declare the amount of CPU resources. The number of vCPUs introduced bycpu quota
declared in a container's spec is:vCPUs = ceiling( quota / period )
.cpuset
:cpuset
is often used to bind the CPUs that tasks can run on. The number of vCPUs may introduced bycpuset
declared in a container's spec is the number of CPUs specified in the set that do not overlap with other containers.
- The amount of CPU resources that the Container wants to use will be declared through the spec. Including the aforementioned annotations, we mainly consider
There are two types of vCPUs that we need to consider, one is the number of vCPUs when starting the VM (named Boot Size
in the doc). The second is the number of vCPUs when CreateContainer
/UpdateContainer
/DeleteContainer
request is received (Real-time Size
in the doc).
The main considerations are InitialSize
and default_vcpus
. There are the following principles:
InitialSize
has priority over default_vcpus
declared in TomlConfig
.
- When there is such an annotation statement, the originally
default_vcpus
will be modified to the number of vCPUs in theInitialSize
as theBoot Size
. (Because not all runtimes support this annotation for the time being, we still keep thedefault_cpus
inTomlConfig
.) - When the specs of all containers are aggregated for sandbox size calculation, the method is consistent with the calculation method of
InitialSize
here.
When we receive an OCI request, it may be for a single container. But what we have to consider is the number of vCPUs for the entire VM. So we will maintain a list. Every time there is a demand for adjustment, the entire list will be traversed to calculate a value for the number of vCPUs. In addition, there are the following principles:
- Do not cut computing power and try to keep the number of vCPUs specified by
InitialSize
.- So the number of vCPUs after will not be less than the
Boot Size
.
- So the number of vCPUs after will not be less than the
cpu quota
takes precedence overcpuset
and the setting history are took into account.- We think quota describes the CPU time slice that a cgroup can use, and
cpuset
describes the actual CPU number that a cgroup can use. Quota can better describe the size of the CPU time slice that a cgroup actually wants to use. Thecpuset
only describes which CPUs the cgroup can use, but the cgroup can use the specified CPU but consumes a smaller time slice, so the quota takes precedence over thecpuset
. - On the one hand, when both
cpu quota
andcpuset
are specified, we will calculate the number of vCPUs based oncpu quota
and ignorecpuset
. On the other hand, ifcpu quota
was used to control the number of vCPUs in the past, and onlycpuset
was updated duringUpdateContainer
, we will not adjust the number of vCPUs at this time.
- We think quota describes the CPU time slice that a cgroup can use, and
StaticSandboxResourceMgmt
controls hotplug.- Some VMMs and kernels of some architectures do not support hotplugging. We can accommodate this situation through
StaticSandboxResourceMgmt
. WhenStaticSandboxResourceMgmt = true
is set, we don't make any further attempts to update the number of vCPUs after booting.
- Some VMMs and kernels of some architectures do not support hotplugging. We can accommodate this situation through