The CNF Conformance program validates interoperability of CNF workloads supplied by multiple different vendors orchestrated by Kubernetes platforms that are supplied by multiple different vendors. The goal is to provide an open source test suite to enable both open and closed source CNFs to demonstrate conformance and implementation of best practices. For more detailed CLI documentation see the usage document.
CNFs should work with any Certified Kubernetes product and any CNI-compatible network that meet their functionality requirements. The CNF Conformance Suite validates this:
- Performing CNI Plugin testing which:
- Tests if CNI Plugin follows the CNI specification
- Performing K8s API usage testing by running API snoop on the cluster which:
- Checks alpha endpoint usage
- Checks beta endpoint usage
- Checks generally available (GA) endpoint usage
The CNF conformance suite checks if state is stored in a custom resource definition or a separate database (e.g. etcd) rather than requiring local storage. It also checks to see if state is resilient to node failure:
- Resetting the container and checking to see if the CNF comes back up
- Using upstream projects for chaos engineering (e.g Litmus)
CNF containers should be isolated from one another and the host. The CNF Conformance suite uses tools like OPA Gatekeeper, Falco, Sysdig Inspect and gVisor:
- Check if there are any shells
- Check if any containers are running in privileged mode
- Check if any protected directories or files are accessed
The CNF should be developed and delivered as a microservice. The CNF Conformance suite tests to determine the organizational structure and rate of change of the CNF being tested. Once these are known we can detemine whether or not the CNF is a microservice. See: Microservice-Principles:
- Check if the CNF have a reasonable startup time.
- Check the image size of the CNF.
The CNF conformance suite checks to see if CNFs support horizontal scaling (across multiple machines) and vertical scaling (between sizes of machines) by using the native K8s kubectl:
- Test increasing/decreasing capacity
- Test small scale autoscaling with kubectl
- Test large scale autoscaling with load test tools like CNF Testbed
- Test if the CNF control layer responds to retries for failed communication (e.g. using Pumba or Blockade for network chaos and Envoy for retries)
(see scalability test usage documentation)
Configuration and lifecycle should be managed in a declarative manner, using ConfigMaps, Operators, or other declarative interfaces. The Conformance suite checks this by:
- Testing if the CNF is installed using a versioned Helm v3 chart
- Searching for hardcoded IP addresses, subnets, or node ports in the configuration
- Checking for a liveness entry in the helm chart and if the container is responsive to it after a reset (e.g. by checking the helm chart entry)
- Checking for a readiness entry in the helm chart and if the container is responsive to it after a reset
- Checking if the pod/container can be started without mounting a volume (e.g. using helm configuration) that has configuration files
- Testing to see if we can start pods/containers and see that the application continues to perform (e.g. using Litmus)
- Testing by reseting any child processes, and when the parent process is started, checking to see if those child processes are reaped (ie. monitoring processes with Falco or sysdig-inspect)
- Testing if the CNF can perform a rolling update (i.e. kubectl rolling update)
- Testing if there are any (non-declarative) hardcoded IP addresses or subnet masks
In order to maintain, debug, and have insight into a protected environment, its infrastructure elements must have the property of being observable. This means these elements must externalize their internal states in some way that lends itself to metrics, tracing, and logging. The Conformance suite checks this:
- Testing to see if there is traffic to Fluentd
- Testing to see if there is traffic to Jaeger
- Testing to see if Prometheus rules for the CNF are configured correctly (e.g. using Promtool)
- Testing to see if there is traffic to Prometheus
- Testing to see if the tracing calls are compatible with OpenTelemetry
- Testing to see if the monitoring calls are compatible with OpenMetric
- Testing to see if there is an OpenTelemetry compatible service installed
- Testing to see if there is an OpenMetric compatible service installed
The CNF Conformance suite will check for usage of standard, in-band deployment tools such as Helm (version 3) charts. The Conformance suite checks this:
- Testing if the install script uses Helm v3
- Testing if the CNF is published to a public helm chart repository.
- Testing if the Helm chart is valid (e.g. using the helm linter)
- Testing if the CNF can perform a rolling update (i.e. kubectl rolling update)
The CNF container should access all hardware and schedule to specific worker nodes by using a device plugin. The CNF Conformance suite checks this:
- Testing if the Platform supplies an OCI compatible runtime
- Testing if the Platform supplies an CRI compatible runtime
- Checking if the CNF is accessing hardware in its configuration files
- Testing if the CNF accessess hardware directly during run-time (e.g. accessing the host /dev or /proc from a mount)
- Testing if the CNF accessess hugepages directly instead of via Kubernetes resources
- Testing if the CNF Testbed performance output shows adequate throughput and sessions using the CNF Testbed (vendor neutral) hardware environment.
Cloud Native Definition requires systems to be Resilient to failures inevitable in cloud environments. CNF Resilience should be tested to ensure CNFs are designed to deal with non-carrier-grade shared cloud HW/SW platform:
- Test for full failures in SW and HW platform: stopped cloud infrastructure/platform services, workload microservices or HW ingredients and nodes
- Test for bursty, regular or partial impairments on key dependencies: CPU cycles by pausing, limiting or overloading; DPDK-based Dataplane networking by dropping and/or delaying packets.
- Test if the CNF crashes when network loss occurs (Network Chaos)
Tools to study/use for such testing methodology: The previously mentioned Pumba and Blocade, ChaosMesh, Mitmproxy, Istio for "Network Resilience", kill -STOP -CONT, LimitCPU, Packet pROcessing eXecution (PROX) engine as Impair Gateway.