Skip to content

Latest commit

 

History

History
428 lines (356 loc) · 20.2 KB

README.md

File metadata and controls

428 lines (356 loc) · 20.2 KB

ci-operator

This document describes how to create CI jobs for Openshift components using ci-operator and is intended for component developers who want to add tests to their CI process.

To begin setting up a CI jobs for a new repository, run make new-repo.

After editing the files under this directory, make sure to run the generator to ensure that your changes are compliant to our conventions and pass the CI tests that will run when you submit your changes as a PR:

make jobs

Pre-submit tests on this repository will ensure that a run of the latest generator does not error on proposed configuration changes and also does not generate any new configuration.

Conventions

Under this directory, we have three main directories:

  • config/$org/$repo/$org-$repo-$branch.yaml Contains your ci-operator definition which describes how the images and tests in your repo works. These files are copied into config maps on the CI cluster and referenced by Prow jobs. If you are building branches within a fork of a repo in another organization, $repo should point to the fork that holds the branch (for example github.com/openshift/kubernetes-metrics-server instead of k8s.io/metrics-server).
  • jobs/$org/$repo/$org-$repo-$branch-(presubmit|postsubmit|periodic).yaml Contains Prow job definitions for each repository that are run on PRs, on merges, or periodically. When we branch jobs, we will copy the current master jobs into a release branch specific job. Each prow job calls into the appropriate subset of the tests defined in your ci-operator config and passes in the secrets and infrastructure info specific to our CI environment
  • templates/*.yaml These templates are used for more complicated jobs that don't run in a single pod. The templates are referenced by the Prow jobs and are instantiated by the ci-operator using parameters generated by the build (references to images usually).

End-to-end tests

This section describes how to configure end-to-end tests using ci-operator. In this context, "end-to-end" means the functionality of the application is being tested on top of a Kubernetes cluster from an end-user perspective.

The preferred way to write this type of tests is using ci-operator. See the documentation for details on how to download, build, and execute it:

https://github.com/openshift/ci-tools.git

ci-operator requires a configuration file for the repository being tested. These files are located in the config directory. The ci-operator repository has documentation for adding a new configuration file in case one doesn't already exist. These files take care of most of the CI process: downloading the source, building binaries, building RPMs, creating images, executing unit tests, etc., and can be built upon for e2e tests with little or no modification.

To add an e2e test:

  1. Determine the pre-requisites for the test. Practically, this means choosing the ci-operator template according to the type of test. There are already files in the templates directory for the most common cases, see Using a template below.
  2. Determine and configure the template's inputs. This is specific to each template and should be documented in its parameters. The e2e test might need minor modifications to fit the environment created by the template.
  3. Add one or more Prow jobs to Prow's configuration file with the information gathered in the previous steps.

Provisioning a cluster

Contrary to other types of tests, e2e tests usually require a cluster, not just a single container. While there aren't yet native primitives in ci-operator for cluster provisioning, it provides one open-ended feature that can be leveraged to accomplish that: template steps.

Template steps allow the creation of arbitrary objects in the cluster where the CI pipeline is executed. This is used to start a pod that will then provision a separate cluster for the tests. This directory already contains a few templates that can be used either directly or (rarely necessary, in practice) as a reference:

  • cluster-launch-e2e.yaml: launches a cluster in GCP using openshift-ansible and runs Origin e2e tests on it, parameterized by test focus.
  • cluster-launch-src.yaml: launches a cluster in GCP using openshift-ansible and runs a script from the repository being tested with the resulting $KUBECONFIG, parameterized by test script.
  • cluster-launch-installer-e2e.yaml: same as cluster-launch-e2e.yaml, but uses openshift-installer instead of openshift-ansible.
  • cluster-launch-installer-src.yaml: same as cluster-launch-src.yaml, but uses openshift-installer instead of openshift-ansible.
  • master-sidecar-4.2.yaml: spins up a simple openshift control plane as a sidecar and waits for the COMMAND specified to the template to be executed, before itself exiting. The test container is given access to the generated configuration and the admin.kubeconfig.

To access the cluster, the test should use the standard configuration loading rules, which are described in the upstream documentation:

https://kubernetes.io/docs/tasks/administer-cluster/access-cluster-api

Using a template

The preferred way to add a test that uses a template is to add it to the ci-operator configuration file and use the configuration generator to generate the job. The list of supported test types can be found in the configuration documentation.

The process for adding jobs manually is significantly more complex. For templates that are expected to be used by many jobs, it may be easier to add support for automatic generation. The example job in the next section can be used as a reference for jobs that are not in that category.

Writing a template

This section covers the process of creating a new template when none of the existing ones provide the workflow required for a particular type of test — e.g. when a new installer needs to be supported. It supplements the ci-operator template documentation.

From the perspective of an end-to-end test, a ci-operator template is simply a way to create one or more pods and auxiliary objects to setup and clean up the environment and execute the test.

While users should deal mostly with the ci-operator configuration file and generate Prow jobs automatically from it, the structure of the Prow jobs has to be taken into consideration when writing a template. For example, the following snippet from the configuration file of repo openshift/origin is used to generate the presubmit job pull-ci-openshift-origin-master-e2e-conformance-k8s which uses the template cluster-launch-installer-src.yaml:

- as: e2e-conformance-k8s
  commands: test/extended/conformance-k8s.sh
  openshift_installer_src:
    cluster_profile: aws

The CI process begins when a webhook from Github triggers the creation of one or more Prow jobs. For a complete description of Prow jobs, see the upstream documentation.

# ci-operator/jobs/openshift/origin/openshift-origin-master-presubmits.yaml
presubmits:
  openshift/origin:
  #
  - agent: kubernetes
    always_run: true
    # Each branch needs its own ci-operator configuration file.
    branches:
    - master
    context: ci/prow/e2e-conformance-k8s
    decorate: true
    # The name should follow the format used for auto-generated jobs:
    # pull-ci-$org-$repo-$branch-$name or branch-ci-$org-$repo-$branch-$name.
    # "e2e-conformance-k8s" is used as a unique identifier for for this job
    # thoughout the job definition (e.g. in `context` above).
    name: pull-ci-openshift-origin-master-e2e-conformance-k8s
    rerun_command: /test e2e-conformance-k8s
    # ci-operator doesn't require the source code of the repository, it will
    # be cloned in a separate container.
    skip_cloning: true
    spec:
      containers:
      # The names passed to `--secret-dir`, `--target`, and `--template` are
      # important and should follow the format presented here.
      - args:
        - --artifact-dir=$(ARTIFACTS)
        - --give-pr-author-access-to-namespace=true
        # `--secret-dir` references a directory that is volume-mounted in the
        # container by combining secrets and configmaps from the cluster. This
        # is one way of passing extra configuration as input to the template.
        - --secret-dir=/usr/local/e2e-conformance-k8s-cluster-profile
        - --target=e2e-conformance-k8s
        # The template is stored in a configmap in the cluster and
        # volume-mounted in the container.
        - --template=/usr/local/e2e-conformance-k8s
        command:
        - ci-operator
        # Other than CONFIG_SPEC, these are specific to the template being
        # used.
        env:
        - name: CLUSTER_TYPE
          value: gcp
        # The ci-operator configuration stored in a configmap in the cluster.
        - name: CONFIG_SPEC
          valueFrom:
            configMapKeyRef:
              key: openshift-origin-master.yaml
              name: ci-operator-master-configs
        - name: JOB_NAME_SAFE
          value: e2e-conformance-k8s
        - name: TEST_COMMAND
          value: test/extended/conformance-k8s.sh
        image: ci-operator:latest
        imagePullPolicy: Always
        name: ""
        resources:
          requests:
            cpu: 10m
        volumeMounts:
        - mountPath: /usr/local/e2e-conformance-k8s-cluster-profile
          name: cluster-profile
        - mountPath: /usr/local/e2e-conformance-k8s
          name: job-definition
          subPath: cluster-launch-src.yaml
      serviceAccountName: ci-operator
      # Specific to the template being used. Combine a secret and a configmap
      # into a directory that will be copied to the namespace created by
      # ci-operator using the `--secret-dir` option.
      volumes:
      - name: cluster-profile
        projected:
          sources:
          - secret:
              name: cluster-secrets-gcp
          - configMap:
              name: cluster-profile-gcp
      # The template stored in a configmap in the cluster.
      - configMap:
          name: prow-job-cluster-launch-src
        name: job-definition
    trigger: ((?m)^/test( all| e2e-conformance-k8s),?(\s+|$))

The Secrets and ConfigMaps referenced by the job reside in the ci namespace. cluster-profile-* are ConfigMaps that contain the cluster profiles in this repository. cluster-secrets-* are Secrets that contain credentials to provision and access clusters in a specific cloud provider (the contents can be seen in the script that populates them.

Inputs

When instantiating the template, data about the pipeline is provided as parameters. The location of images and RPMs from both the release and the CI pipeline is available this way. Extra parameters can be provided via environment variables, which will have to be set by the Prow job.

External access to the images that were built in the test namespace is required by most end-to-end tests, so templates often create this role binding:

- kind: RoleBinding
  apiVersion: authorization.openshift.io/v1
  metadata:
    name: ${JOB_NAME_SAFE}-image-puller
    namespace: ${NAMESPACE}
  roleRef:
    name: system:image-puller
  subjects:
  - kind: SystemGroup
    name: system:unauthenticated

The template can reference objects from any namespace, but Kubernetes requires them to be in the same namespace to be used as volume mounts. As described in the section above, the Prow job definition and ci-operator's --secret-dir can be used to combine objects into a volume mount and make them available in the test namespace.

Outputs

The outputs of a template test are:

  • Success/failure status, determined from the test pod.
  • The pod's stdout and stderr, reflected in ci-operator's output in case of failure.
  • Artifacts.

These are described in more detail in the ci-operator documentation.

Adding a template

With the template file ready, the steps required to add it to the repository and make it available for CI jobs are:

  1. Create the yaml file in the templates/ directory.
  2. Add the files to the config-updater section of Prow's configuration file to ensure they are added to a ConfigMap in the CI cluster.
  3. Optional: add a test type to ci-operator to enable automatic generation of jobs that use this template.
  4. Add necessary secrets (if any) to the deployment configuration in this repository and apply it to the cluster.

Because the configuration updater configuration has to be updated before a PR with the files is merged, those changes have to be merged previously in a separate PR.

Testing a template

A job that uses a template can be tested in two different ways. The easiest is to just create a pull request, which will then trigger a run of all added or changed jobs. For complete control of the execution, the more manual process of assembling the ci-operator call can be used.

Testing manually

ci-operator tests can be executed locally with little effort, but setting up the dependencies for template tests is more involved. The typical end-to-end test requires:

  • A kubeconfig pointing to a cluster with external access.
  • The ci-operator configuration file.
  • The template file.
  • The secrets required for the --secret-dir option, if applicable.
  • The environment variables required, if applicable.

The simplest way to get started is to create a personal namespace in the CI cluster. Substitute mynamespace below with the name of that namespace.

The secrets and environment variables are very specific to the template in use, but the e2e-conformance-k8s can be used as a general example. The template it uses (cluster-launch-src) requires two parameters (the others are all provided by ci-operator): CLUSTER_TYPE determines the cloud provider used to provision the cluster, and TEST_COMMAND is the command that executes the test.

This template also requires a secret containing the cluster profile and credentials. In the CI cluster, it is created using a volume that combines the projection of a Secret and a ConfigMap. Locally, it has to be assembled into a directory manually. How these objects are composed is described in the "writing a template" section above. One final note: because the name of the secret is determined by the argument passed to --secret-dir, the directory has to be named in a way that reflects the secret name expected by the template.

Putting this all together, to execute the e2e-conformance-k8s test the following command can be used:

name=mytestname
CLUSTER_TYPE=gcp
mkdir artifacts/ "$name-cluster-profile"/
ln -s "$PWD/cluster/test-deploy/$CLUSTER_TYPE/"* "$name-cluster-profile"/
# populate the following files in the $name-cluster-profile directory:
# - gce.json
# - ops-mirror.pem
# - ssh-privatekey
# - ssh-publickey
# - telemeter-token
export CLUSTER_TYPE JOB_NAME_SAFE=$name TEST_COMMAND=test/extended/conformance-k8s.sh
ci-operator \
    --artifact-dir artifacts/ \
    --config ci-operator/config/openshift/origin/openshift-origin-master.yaml \
    --git-ref openshift/origin@master \
    --template ci-operator/templates/cluster-launch-src.yaml \
    --target cluster-launch-src \
    --secret-dir "$name-cluster-profile/" \
    --namespace mynamespace

Rebalancing tests among platforms

If test volume for a given platform exceeds the Boskos lease capacity, jobs-failing-with-lease-acquire-timeout will fire. Presubmit jobs may be rebalanced to move platform-agnostic jobs to platforms with available capacity. Component teams may mark their presubmit jobs as platform-agnostic by configuring as names which exclude the platform slug (e.g. aws), whose absence is used as a marker of "this test is platform-agnostic". For example, see release#10152. To locate platform-specific jobs which might be good candidates for moving to the platform-agnostic pool, you can use:

$ hack/step-jobs-by-platform.py
workflows which need alternative platforms to support balancing:
  baremetalds-e2e
  ipi-aws
  ipi-aws-ovn-hybrid
  openshift-e2e-aws-csi
...
count	platform	status	alternatives	job
39	gcp	balanceable	aws,azure,vsphere	pull-ci-openshift-cluster-version-operator-master-e2e
26	aws	unknown	azure,gcp,vsphere	pull-ci-openshift-sriov-dp-admission-controller-master-e2e-aws
15	aws	unknown	azure,gcp,vsphere	pull-ci-openshift-cluster-authentication-operator-master-e2e-aws
10	aws	balanceable	azure,vsphere	pull-ci-openshift-machine-config-operator-master-e2e-ovn-step-registry
9	aws	unknown	gcp	pull-ci-openshift-cluster-samples-operator-release-4.1-e2e-aws-image-ecosystem
...

Rebalancing AWS tests among regions and zones

Occasionally we hit install errors like:

Error launching source instance: InsufficientInstanceCapacity: We currently do not have sufficient m5.xlarge capacity in the Availability Zone you requested (us-east-1b). Our system will be working on provisioning additional capacity. You can currently get m5.xlarge capacity by not specifying an Availability Zone in your request or choosing us-east-1a, us-east-1c, us-east-1d, us-east-1f

Or AWS will have issues in a particular region, resulting in:

[DEBUG] plugin.terraform-provider-aws: 2019/03/22 18:02:51 [DEBUG] [aws-sdk-go] DEBUG: Response ec2/RunInstances Details:"
[DEBUG] plugin.terraform-provider-aws: ---[ RESPONSE ]--------------------------------------"
[DEBUG] plugin.terraform-provider-aws: HTTP/1.1 500 Internal Server Error"

or similar. To keep CI going during these events, we can reconfigure to push CI load away from impacted regions and zones. Focusing on step-registry consumers, you could avoid us-east-1b by changing:

  • ipi-conf-aws-commands.sh to set explicit zone_1 and zone_2 for a particular aws_region. You can also drop an entry entirely, although you will need to update $((RANDOM % 4)) to match the number of defined entries. If your changes affect concurrent AWS capacity (e.g. because you removed a high-volume region), you may also need to adjust the Boskos lease capacity to avoid overloading the remaining capacity.
  • ipi-conf-aws-sharednetwork-commands.sh to drop affected regions. All of the same caveats from the previous ipi-conf-aws-commands.sh entry apply here too, although it is harder to shift zones within a region without creating completely new shared subnets. If you do need to create new shared subnets, the procedure is covered here.
  • There are currently no steps excercising the user-provided flow on AWS, so no pointers about what to adjust there.
  • There are legacy templates which could be updated to pivot regions and zones, but they should have few consumers and leaving them impacted would help motivate the remaining consumers to move to the step registry.