Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch from XZ to ZSTD and improve release artifact transparency #343

Merged
merged 8 commits into from
Apr 8, 2024

Conversation

alerque
Copy link
Collaborator

@alerque alerque commented Apr 3, 2024

  • Switch from xz to zst compression for official artifacts

    I've always preferred ZSTD do XZ anyway, but the default Ubuntu runner GitHub Actions used to use was so old that it did not have a sufficiently new autotools release to have ZSTD support. Hence XZ was always a compromise to get better compression without compatibility problems. Times have changed, the upstream xz-utils project has been backdoored, and ZSTD is available in Ubuntu LTS releases so we don't have a problem auto-generating releases any more.

    As-yet no evidence has surfaced that XZ compressed artifacts are compromised, but since there are no blockers anyway switching seems like a good plan to me.

  • Generate checksums for release artifacts in CI

    One of the major complains surrounding the recent XZ fiasco is that auto-tools generated source files include so much obtuse code that they are difficult to audit. Also it isn't immediately apparent what sources they have been generated from. In our case we're generating the source dist files in CI anyway and automatically attaching them to releases, but GH does not make it possible to verify this. They could just as well be reposted later by a malicious maintainer.

    This is not a magic bullet to fix all that, but it should help. The CI environment can be verified by looking at the workflow file and the other Git sources so we're not using a modified version of autotools or anything like that. Checksums are now being generated after making the distribution tarballs, and echoed to the output log so it is possible to verify that the files generated in CI are actually still the ones attached to the release. The checksums file is also posted to the release.

  • Guard checksum generation (and the extra dependencies) behind maintainer mode

  • Use more GNU/Autotools idiomatic names and methods

    We're extending GNU/Autotools with our own macros. This is allowed and even expected, but there are certainly more and less idiomatic ways to do so. This normalizes all our extended bits to be as idiomatic as possible to make auditing easier.

    • Filename prefixes are more representative of their source: especially we are not hijacking the ax_ prefix which usually references something from autotools-archive that could be compared with a known source. All files starting with ax_ could be removed from the repository and the project still built provided autotools-archive was available on the host system. We're still vendoring them because it is not as broadly available as autotools, but these bits don't need to be audited as part of this project.

      I used que_ as a prefix to make them easy to find and because I've authored them and share identical extensions across a number of other projects. The configure macros and makefile extensions have matching prefixes so they sort together.

    • The AMINCLUDE system is leveraged instead of hand rolling includes. This is actually probably slightly harder to grok, but again the macros used are standardized in autoconf-archive.) I think the more idiomatic usage has the payoff of not sounding any more alarms than we need to.

      It also has the advantage of a single place to enable/disable a component of the build instead of needing to comment out both the configure macro(s) and makefile include(s) separately.

  • Include macros from autoconf-archive (pristine)

    Reverting this commit should remove the macro code. The project would still be buildable on any system with autoconf-archive, but since that is considerably less than systems with autoconf which is our actual target, we're vendoring a copy of them here. These files should be bit-for-bit the same as what appears in current macro archives.


Also just a reminder if you still don't trust autotools generated tarballs (and let's be honest, there is a lot of obtuse code in them!) you can use your own system instead of ours to generate all the intermediate artifacts. The git archive generated sources or a Git clone are also usable, but you need to run ./botostrap.sh (or even just autoreconf --install) yourself before being able to configure.

I've always preferred ZSTD do XZ anyway, but the default Ubuntu runner
GitHub Actions used to use was so old that it did not have
a sufficiently new autotools release to have ZSTD support. Hence XZ was
always a compromise to get better compression without compatibility
problems. Times have changed, the upstream `xz-utils` project has been
backdoored, and ZSTD is available in Ubuntu LTS releases so we don't
have a problem auto-generating releases any more.

As-yet no evidence has surfaced that XZ compressed artifacts are
compromised, but since there are no blockers anyway switching seems like
a good plan to me.
@alerque alerque force-pushed the burninate-xz branch 3 times, most recently from 7b06630 to 2e2e5a7 Compare April 3, 2024 11:24
@alerque
Copy link
Collaborator Author

alerque commented Apr 3, 2024

I was playing around with reproducibility with this and ran into an interesting issue. I wanted to verify that the standalone file being generated in CI was bit-for-bit identical with one made locally so it could be audited if desired. The upshot is it is reproducible, but only on the default branch. Verifying a PR is somewhat harder because Github is doing a merge action before running the tests and testing the theoretial result of the PR being merged, not just the branch. That means the artifacts generated in the workflows show here will have different VCSH_VERSION strings, e.g. local vs. this pr:

--- vcsh-standalone.sh  2024-04-03 14:58:31.815280296 +0300
+++ /home/caleb/Downloads/vcsh/vcsh-standalone.sh       2024-04-03 11:25:10.000000000 +0300
@@ -15,7 +15,7 @@
 # version built from the main branch HEAD. If "-standalone" is appended you are
 # using a version built explicitly for portability as a standalone script
 # rather than being installed to a specific system.
-VCSH_VERSION='2.0.8.r3-g2e2e5a7-standalone'; export VCSH_VERSION
+VCSH_VERSION='2.0.8.r4-gd416d7d-standalone'; export VCSH_VERSION
 VCSH_SELF='vcsh-standalone.sh'; export VCSH_SELF

 # Ensure all files created are accessible only to the current user.

The same workflow will run on the branch pushed by the the PR contributor, but it will only be accessible by looking at the workflow runs on the branch on their fork.

I think this is reasonable behavior, it just caught be a little off guard. And again the upstream is that artifacts on every commit to main and of course release tags should be reproducible.

One of the major complains surrounding the recent XZ fiasco is that
auto-tools generated source files include so much obtuse code that they
are difficult to audit. Also it isn't immediately apparent what sources
they have been generated from. In our case we're generating the source
dist files in CI anyway and automatically attaching them to releases,
but GH does not make it possible to verify this. They could just as well
be reposted later by a malicious maintainer.

This is not a magic bullet to fix all that, but it should help. The CI
environment can be verified by looking at the workflow file and the
other Git sources so we're not using a modified version of autotools or
anything like that. Checksums are now being generated after making the
distribution tarballs, and *echoed to the output log* so it is possible
to verify that the files generated in CI are actually still the ones
attached to the release. The checksums file is also posted to the
release.
@alerque alerque force-pushed the burninate-xz branch 5 times, most recently from 0433c1e to 735660b Compare April 6, 2024 19:20
We're extending GNU/Autotools with our own macros. This is allowed and
even expected, but there are certainly more and less idiomatic ways to
do so. This normalizes all our extended bits to be as idiomatic as
possible to make auditing easier.

* Filename prefixes are more representative of their source:
  especially we are not hijacking the ax_ prefix which usually
  references something from autotools-archive that could be compared
  with a known source. All files starting with ax_ could be removed from
  this repository and the project still build provided autotools-archive
  is available on the host system. We're still vendoring them because
  it is not as broadly available as autotools, but these bits don't need
  to be audited as part of this project.

  I used que_ as a prefix for custom macros to make them easy to find
  and because I've authored them and share identical extensions across
  a number of other projects. The configure macros and correstponding
  makefile fragments have matching prefixes so they sort together.

* The AMINCLUDE system is leveraged instead of making the maintainer
  hand-rol includes for each fragment. This is actually probably
  slightly *harder* to grok, but again the macros used are standardized
  in autoconf-archive.) I think the more idiomatic usage has the payoff
  of not sounding any more alarms than we need to. The gotcha was that
  we do still want to inline them, which means a bit of extra
  bootstrapping.

  The macro include system also has the advantage of a single place to
  enable/disable a component of the build instead of needing to comment
  out both the configure.ac macro(s) and the matching Makefile.am
  include(s) separately.
@alerque alerque force-pushed the burninate-xz branch 3 times, most recently from 1158bb0 to 9720336 Compare April 6, 2024 23:04
Reverting this commit should remove the macro code. The project would
still be buildable on any system with autoconf-archive, but since that
is considerably less than systems with autoconf which is our actual
target, we're vendoring a copy of them here. These files should be
bit-for-bit the same as what appears in current macro archives.
@alerque alerque merged commit d2e92cd into RichiH:main Apr 8, 2024
5 checks passed
@alerque alerque deleted the burninate-xz branch April 8, 2024 10:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant