Skip to content

Commit

Permalink
[QT-637] Streamline our build pipeline (#24892)
Browse files Browse the repository at this point in the history
Context
-------
Building and testing Vault artifacts on pull requests and merges is
responsible for about 1/3rd of our overall spend on Vault CI. Of the
artifacts that we ship as part of a release, we do Enos testing scenarios
on the `linux/amd64` and `linux/arm64` binaries and their derivative
artifacts. The extended build artifacts for non-Linux platforms or less
common machine architectures are not tested at this time. They are built,
notarized, and signed as part of every pull request update and merge. As
we don't actually test these artifacts, the only gain we get from this
rather expensive behavior is that we wont merge a change that would prevent
Vault from building on one of the extended targets. Extended platform or
architecture changes are quite rare, so performing this work as frequently
as we do is costly in both monetary and developer time for little relative
safety benefit.

Goals
-----
Rethink and implement how and when we build binaries and artifacts of Vault
so that we can spend less money on repetitive work and while also reducing
the time it takes for the build and test pipelines to complete.

Solution
--------
Instead of building all release artifacts on every push, we'll opt to build
only our testable (core) artifacts. With this change we are introducing a
bit of risk. We could merge a change that breaks an extended platform and
only find out after the fact when we trigger a complete build for a release.
We'll hedge against that risk by building all of the release targets on a
scheduled cadence to ensure that they are still buildable.

We'll make building all of the targets optional on any pull request by
use of a `build/all` label on the pull request.

Further considerations
----------------------
* We want to reduce the total number of workflows and runners for all of our
  pipelines if possible. As each workflow runner has infrastructure cost and
  runner time penalties, using a single runner over many is often preferred.
* Many of our jobs runners have been optimized for cost and performance. We
  should simplify the choices of which runners to use.
* CRT requires us to use the same build workflow in both CE and Ent.
  Historically that meant that modifying `build.yml` in CE would result in a
  merge conflict with `build.yml` in Ent, and break our merge workflows.
* Workflow flow control in both `build.yml` and `ci.yml` can be quite
  complicated, as each needs to maintain compatibility whether executed as CE
  or Ent, and when triggered with various Github events like pull_request,
  push, and workflow_call, each with their own requirements.
* Many jobs utilize similar patterns of flow control and metadata but are not
  reusable.
* Workflow call depth has a maximum of four, so we need to be quite
  considerate when calling other workflows.
* Called workflows can only have 10 inputs.

Implementation
--------------
* Refactor the `build.yml` workflow to be agnostic to whether or not it is
  executing in CE or Ent. That makes future updates to the build much easier
  as we won't have to worry about merge conflicts when the change is merged
  downstream.
* Extract common steps in workflows into composite actions that we can reuse.
* Fix bugs where some but not all workflows would use different Git
  references when building and testing a pull request.
* We rewrite the application, docs, and UI change helpers as a composite
  action. This allows us to re-use this logic to make consistent behavior
  choices across build and CI.
* We combine several `build.yml` and `ci.yml` jobs into our final job.
  This reduces the number of workflows required for the same behavior while
  saving time overall.
* Update most of our action pins.

Results
-------

| Metric            | Before   | After   | Diff  |
|-------------------|----------|---------|-------|
| Duration:         | ~14-18m  | ~15-18m | ~ =   |
| Workflows:        | 43       | 18      | - 58% |
| Billable time:    | ~1h15m   | 16m     | - 79% |
| Saved artifacts:  | 34       | 12      | - 65% |

Infra costs should map closely to billable time.
Network I/O costs should map closely to the workflow count.
Storage costs should map directly with saved artifacts.

We could probably get parity with duration by getting more clever with
our UBI container build, as that's where we're seeing the increase. I'm
not yet concerned as it takes roughly the same time for this job to
complete as it did before.

While the CI workflow was not the focus on the PR, some shared
refactoring does show some marginal improvements there.

| Metric            | Before   | After    | Diff   |
|-------------------|----------|----------|--------|
| Duration:         | ~24m     | ~12.75m  | - 15%  |
| Workflows:        | 55       | 47       | - 8%   |
| Billable time:    | ~4h20m   | ~3h36m   | - 7%   |

Further focus on streamlining the CI workflows would likely result in a
few more marginal improvements, but nothing on the order like we've seen
with the build workflow.

Signed-off-by: Ryan Cragun <[email protected]>
  • Loading branch information
ryancragun authored Feb 6, 2024
1 parent 87d76fc commit 89c75d3
Show file tree
Hide file tree
Showing 31 changed files with 1,664 additions and 1,049 deletions.
201 changes: 201 additions & 0 deletions .github/actions/build-vault/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,201 @@
# Copyright (c) HashiCorp, Inc.
# SPDX-License-Identifier: BUSL-1.1

---
name: Build Vault
description: |
Build various Vault binaries and package them into Zip bundles, Deb and RPM packages,
and various container images. Upload the resulting artifacts to Github Actions artifact storage.
This composite action is used across both CE and Ent, thus is should maintain compatibility with
both repositories.
inputs:
github-token:
type: string
description: An elevated Github token to access private Go modules if necessary.
default: ""
cgo-enabled:
type: number
description: Enable or disable CGO during the build.
default: 0
create-docker-container:
type: boolean
description: Package the binary into a Docker/AWS container.
default: true
create-redhat-container:
type: boolean
description: Package the binary into a Redhat container.
default: false
create-packages:
type: boolean
description: Package the binaries into deb and rpm formats.
default: true
goos:
type: string
description: The Go GOOS value environment variable to set during the build.
goarch:
type: string
description: The Go GOARCH value environment variable to set during the build.
goarm:
type: string
description: The Go GOARM value environment variable to set during the build.
default: ""
goexperiment:
type: string
description: Which Go experiments to enable.
default: ""
go-tags:
type: string
description: A comma separated list of tags to pass to the Go compiler during build.
default: ""
package-name:
type: string
description: The name to use for the linux packages.
default: ${{ github.event.repository.name }}
vault-binary-name:
type: string
description: The name of the vault binary.
default: vault
vault-edition:
type: string
description: The edition of vault to build.
vault-version:
type: string
description: The version metadata to inject into the build via the linker.
web-ui-cache-key:
type: string
description: The cache key for restoring the pre-built web UI artifact.

outputs:
vault-binary-path:
description: The location of the built binary.
value: ${{ steps.containerize.outputs.vault-binary-path != '' && steps.containerize.outputs.vault-binary-path || steps.metadata.outputs.binary-path }}

runs:
using: composite
steps:
- name: Ensure zstd is available for actions/cache
# actions/cache restores based on cache key and "cache version", the former is unique to the
# build job or web UI, the latter is a hash which is based on the runner OS, the paths being
# cached, and the program used to compress it. Most of our workflows will use zstd to compress
# the cached artifact so we have to have it around for our machines to get both a version match
# and to decompress it. Most runners include zstd by default but there are exception like
# our Ubuntu 20.04 compatibility runners which do not.
shell: bash
run: which zstd || (sudo apt update && sudo apt install -y zstd)
- uses: ./.github/actions/set-up-go
with:
github-token: ${{ inputs.github-token }}
- if: inputs.vault-edition != 'ce'
name: Configure Git
shell: bash
run: git config --global url."https://${{ inputs.github-token }}:@github.com".insteadOf "https://github.com"
- name: Restore UI from cache
uses: actions/cache@e12d46a63a90f2fae62d114769bbf2a179198b5c # v3.3.3
with:
# Restore the UI asset from the UI build workflow. Never use a partial restore key.
enableCrossOsArchive: true
fail-on-cache-miss: true
path: http/web_ui
key: ${{ inputs.web-ui-cache-key }}
- name: Metadata
id: metadata
env:
# We need these for the artifact basename helper
GOARCH: ${{ inputs.goarch }}
GOOS: ${{ inputs.goos }}
VERSION: ${{ inputs.vault-version }}
VERSION_METADATA: ${{ inputs.vault-edition != 'ce' && inputs.vault-edition || '' }}
shell: bash
run: |
if [[ '${{ inputs.vault-edition }}' =~ 'ce' ]]; then
build_step_name='Vault ${{ inputs.goos }} ${{ inputs.goarch }} v${{ inputs.vault-version }}'
package_version='${{ inputs.vault-version }}'
else
build_step_name='Vault ${{ inputs.goos }} ${{ inputs.goarch }} v${{ inputs.vault-version }}+${{ inputs.vault-edition }}'
package_version='${{ inputs.vault-version }}+ent' # this should always be +ent here regardless of enterprise edition
fi
{
echo "artifact-basename=$(make ci-get-artifact-basename)"
echo "binary-path=dist/${{ inputs.vault-binary-name }}"
echo "build-step-name=${build_step_name}"
echo "package-version=${package_version}"
} | tee -a "$GITHUB_OUTPUT"
- name: ${{ steps.metadata.outputs.build-step-name }}
env:
CGO_ENABLED: ${{ inputs.cgo-enabled }}
GO_TAGS: ${{ inputs.go-tags }}
GOARCH: ${{ inputs.goarch }}
GOARM: ${{ inputs.goarm }}
GOOS: ${{ inputs.goos }}
GOEXPERIMENT: ${{ inputs.goexperiment }}
GOPRIVATE: github.com/hashicorp
VERSION: ${{ inputs.version }}
VERSION_METADATA: ${{ inputs.vault-edition != 'ce' && inputs.vault-edition || '' }}
shell: bash
run: make ci-build
- if: inputs.vault-edition != 'ce'
shell: bash
run: make ci-prepare-legal
- name: Bundle Vault
env:
BUNDLE_PATH: out/${{ steps.metadata.outputs.artifact-basename }}.zip
shell: bash
run: make ci-bundle
# Use actions/upload-artifact @3.x until https://hashicorp.atlassian.net/browse/HREL-99 is resolved
- uses: actions/upload-artifact@a8a3f3ad30e3422c9c7b888a15615d19a852ae32 # v3.1.3
with:
name: ${{ steps.metadata.outputs.artifact-basename }}.zip
path: out/${{ steps.metadata.outputs.artifact-basename }}.zip
if-no-files-found: error
- if: inputs.create-packages == 'true'
uses: hashicorp/actions-packaging-linux@v1
with:
name: ${{ inputs.package-name }}
description: Vault is a tool for secrets management, encryption as a service, and privileged access management.
arch: ${{ inputs.goarch }}
version: ${{ steps.metadata.outputs.package-version }}
maintainer: HashiCorp
homepage: https://github.com/hashicorp/vault
license: BUSL-1.1
binary: ${{ steps.metadata.outputs.binary-path }}
deb_depends: openssl
rpm_depends: openssl
config_dir: .release/linux/package/
preinstall: .release/linux/preinst
postinstall: .release/linux/postinst
postremove: .release/linux/postrm
- if: inputs.create-packages == 'true'
id: package-files
name: Determine package file names
shell: bash
run: |
{
echo "rpm-files=$(basename out/*.rpm)"
echo "deb-files=$(basename out/*.deb)"
} | tee -a "$GITHUB_OUTPUT"
- if: inputs.create-packages == 'true'
# Use actions/upload-artifact @3.x until https://hashicorp.atlassian.net/browse/HREL-99 is resolved
uses: actions/upload-artifact@a8a3f3ad30e3422c9c7b888a15615d19a852ae32 # v3.1.3
with:
name: ${{ steps.package-files.outputs.rpm-files }}
path: out/${{ steps.package-files.outputs.rpm-files }}
if-no-files-found: error
- if: inputs.create-packages == 'true'
# Use actions/upload-artifact @3.x until https://hashicorp.atlassian.net/browse/HREL-99 is resolved
uses: actions/upload-artifact@a8a3f3ad30e3422c9c7b888a15615d19a852ae32 # v3.1.3
with:
name: ${{ steps.package-files.outputs.deb-files }}
path: out/${{ steps.package-files.outputs.deb-files }}
if-no-files-found: error
# Do our containerization last as it will move the binary location if we create containers.
- uses: ./.github/actions/containerize
id: containerize
with:
docker: ${{ inputs.create-docker-container }}
redhat: ${{ inputs.create-redhat-container }}
goarch: ${{ inputs.goarch }}
goos: ${{ inputs.goos }}
vault-binary-path: ${{ steps.metadata.outputs.binary-path }}
vault-edition: ${{ inputs.vault-edition }}
vault-version: ${{ inputs.vault-version }}
73 changes: 73 additions & 0 deletions .github/actions/changed-files/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# Copyright (c) HashiCorp, Inc.
# SPDX-License-Identifier: BUSL-1.1

---
name: Determine what files changed between two git referecnes.
description: |
Determine what files have changed between two git references. If the github.event_type is
pull_request we'll compare the github.base_ref (merge target) and pull request head SHA.
For other event types we'll gather the changed files from the most recent commit. This allows
us to support PR and merge workflows.
outputs:
app-changed:
description: Whether or not the vault Go app was modified.
value: ${{ steps.changed-files.outputs.app-changed }}
docs-changed:
description: Whether or not the documentation was modified.
value: ${{ steps.changed-files.outputs.docs-changed }}
ui-changed:
description: Whether or not the web UI was modified.
value: ${{ steps.changed-files.outputs.ui-changed }}
files:
description: All of the file names that changed.
value: ${{ steps.changed-files.outputs.files }}

runs:
using: composite
steps:
- id: ref
shell: bash
name: ref
run: |
# Determine our desired checkout ref.
#
# * If the trigger event is pull_request we will default to a magical merge SHA that Github
# creates. This SHA is the product of what merging our PR into the merge target branch at
# at the point in time when we created the PR. When you push a change to a PR branch
# Github updates this branch if it can. When you rebase a PR it updates this branch.
#
# * If the trigger event is pull_request and a `checkout-head` tag is present or the
# checkout-head input is set, we'll use HEAD of the PR branch instead of the magical
# merge SHA.
#
# * If the trigger event is a push (merge) then we'll get the latest commit that was pushed.
#
# * For anything any other event type we'll default to whatever is default in Github.
if [ '${{ github.event_name }}' = 'pull_request' ]; then
checkout_ref='${{ github.event.pull_request.head.sha }}'
elif [ '${{ github.event_name }}' = 'push' ]; then
# Our checkout ref for any other event type should default to the github ref.
checkout_ref='${{ github.event.after && github.event.after || github.event.push.after }}'
else
checkout_ref='${{ github.ref }}'
fi
echo "ref=${checkout_ref}" | tee -a "$GITHUB_OUTPUT"
- uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1
with:
repository: ${{ github.repository }}
path: "changed-files"
# The fetch-depth could probably be optimized at some point. It's currently set to zero to
# ensure that we have a successfull diff, regardless of how many commits might be present
# present between the two references we're comparing. It would be nice to change this
# depending on the number of commits by using the push.commits and/or pull_request.commits
# payload fields, however, they have different behavior and limitations. For now we'll do
# the slow but sure thing of getting the whole repository.
fetch-depth: 0
ref: ${{ steps.ref.outputs.ref }}
- id: changed-files
name: changed-files
# This script writes output values to $GITHUB_OUTPUT and STDOUT
shell: bash
run: ./.github/scripts/changed-files.sh ${{ github.event_name }} ${{ github.ref_name }} ${{ github.base_ref }}
working-directory: changed-files
77 changes: 77 additions & 0 deletions .github/actions/checkout/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# Copyright (c) HashiCorp, Inc.
# SPDX-License-Identifier: BUSL-1.1

---
name: Check out the correct git reference.
description: |
Determine and checkout the correct Git reference depending on the actions event type and tags.
inputs:
checkout-head:
description: |
Whether or not to check out HEAD on a pull request. This can also be triggered with a
`checkout-head` tag.
default: 'false'
path:
description: Relative path to $GITHUB_WORKSPACE to check out to
default: ""

outputs:
ref:
description: The git reference that was checked out.
value: ${{ steps.ref.outputs.ref }}
depth:
description: The fetch depth that was checked out.
value: ${{ steps.ref.outputs.ref }}

runs:
using: composite
steps:
- id: ref
shell: bash
run: |
# Determine our desired checkout ref and fetch depth. Depending our our workflow event
# trigger, inputs, and tags, we'll check out different references at different depths.
#
# * If the trigger event is a pull request we will default to a magical merge SHA that Github
# creates. Essentially, this SHA is the product of merging our PR into the merge target
# branch at some point in time. When you push a change to a PR branch Github updates this
# branch if it can.
# * If the trigger event is a pull request and a `checkout-head` tag is present or the
# checkout-head input is set, we'll use HEAD of the PR branch instead of the magical
# merge SHA.
# * If the trigger event is a push (merge) then we'll get the latest commit that was pushed.
# * For anything any other event type we'll default to whatever is default in Github.
#
# Our fetch depth will varies depending on what our chosen SHA is. We normally want to do
# the most shallow clone possible for speed, but we also need to support getting history
# for determining what files have changed, etc. We'll always check out one level deep for
# merges or standard pull requests. If checking out HEAD is requested we'll fetch a deeper
# history because we need all commits on the branch.
#
if [ '${{ github.event_name }}' = 'pull_request' ]; then
if [ '${{ contains(github.event.pull_request.labels.*.name, 'checkout-head') || inputs.checkout-head == 'true' }}' = 'true' ]; then
checkout_ref='${{ github.event.pull_request.head.sha }}'
fetch_depth=0
else
checkout_ref='${{ github.ref }}'
fetch_depth=1
fi
elif [ '${{ github.event_name }}' = 'push' ]; then
# Our checkout ref for any other event type should default to the github ref.
checkout_ref='${{ github.event.push.after }}'
fetch_depth=1
else
checkout_ref='${{ github.ref }}'
fetch_depth=0
fi
{
echo "ref=${checkout_ref}"
echo "depth=${fetch_depth}"
} | tee -a "$GITHUB_OUTPUT"
- uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1
with:
path: ${{ inputs.path }}
fetch-depth: ${{ steps.ref.outputs.depth }}
ref: ${{ steps.ref.outputs.ref }}
Loading

0 comments on commit 89c75d3

Please sign in to comment.