Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the OpenSearch Release Process Accessible and Executable by External Contributors #5171

Open
gaiksaya opened this issue Nov 7, 2024 · 4 comments
Assignees
Labels
enhancement New Enhancement release

Comments

@gaiksaya
Copy link
Member

gaiksaya commented Nov 7, 2024

Overview

As of 2.18.0 release of OpenSearch and OpenSearch-Dashboards, we have most processes in the release automated individually. This includes almost all major release steps such as building, assembling, testing, signing and promoting artifacts. We also have one step release for publishing the artifacts to all the platforms with one click.
However, what we currently lack is the ability to link these automations end-to-end, streamline communication between stakeholders throughout the process and the ability of an external non-amazonian community member to be a release manager.

This issue describes the current process, the gaps as well as an approach to make the release process more efficient and hands-free. It also argues how 1-click release process is not a feasible option for OpenSearch distribution releases.

What is a 1-click release

1-click release terminology came from a universal release process that was introduced for standalone components in the OpenSearch-project. Check the Github issue and the onboarding guide for more details.
TL;DR: When a component is ready to be released, the maintainer of the component repository initiates a release by pushing a tag to the repository. This triggers a 2 person review of the release using a GitHub issue. Once approved, a draft release is created on the GitHub with the release artifacts attached to it. The draft release triggers the component jenkins workflow that helps to sign and publish the release to the right platform.

Analysis

OpenSearch and OpenSearch-Dashboards consist of multiple components which includes core + plugins. As of 2.18.0, we have 24 backend plugins and 15 front-end plugins that are bundled together to form various distributions.
Comparing to the existing 1-click release process where the process only involves publishing to the right platform after the product has been tested and validated, the release process for OpenSearch is fairly complicated.
The overall release process involves a series of steps, including building, assembling, thorough testing, and meeting various criteria across multiple components. A fully automated, one-click release process may be challenging to achieve, but a more realistic and valuable goal would be to create a streamlined release process that any external community member can manage with minimal effort. This approach ensures accessibility and efficiency while still maintaining control and quality.

What we have

Below is the list of all the automation we have w.r.t release:

  • A robust build and assembly system for OpenSearch and OpenSearch-Dashboards that supports incremental and continuous builds for the given version, platform, architecture and distribution.
  • A signing system that supports PGP, Windows and Macos Signing.
  • End to end integration test framework that tests all (most of) the components bundled in the distributions.
  • Workflow that checks the release notes status for all components and posts the status in the release issue as a comment.
  • Release branch creation workflow
  • Release manifest locking workflow
  • Release tag creation workflow
  • 1-click central release promotion workflow that publishes all the artifacts to all the platforms.
  • A build and test failure notification system via GitHub issue.
  • Thorough documentation of the release process, with each step explained in details.

What we lack

As per the entrance and exit criteria:

Entrance Criteria

Entrance Criteria Automation Status
Each component release issue has an assigned owner Can be tracked in metrics dashboard but release manager needs to go and get the status and update it manually.
Documentation draft PRs are up and in tech review for all component changes We do not have an automated mechanism to check this today. Need to go and check manually.
Sanity testing is done for all components We rely on component teams
Code coverage has not decreased (all new code has tests) Added recently to metrics opensearch-project/opensearch-metrics#90 but lacks comparison as of now
Release notes are ready and available for all components We do have a release notes checker automation in place but a release manager needs to run the workflow and monitor the status via GH comment.
Roadmap is up-to-date (information is available to create release highlights) Has to be manual.
Release ticket is cut, and there's a forum post announcing the start of the window Release ticket cut is automated. Verification of the same is manual via metrics portal. Any kind of communication, forum or otherwise is manual.
Any necessary security reviews are complete Has to be manual.

Exit Criteria

Exit Criteria Automation Status
Performance tests are run, results are posted to the release ticket and there no unexpected regressions Release manager has to get the performance data from data-store cluster and post it in the release issue.
Documentation has been fully reviewed and signed off by the documentation community Manual check required
All integration tests are passing We have automated the testing end to end. However, a release manager needs to go and check for all green status in the metrics dashboards
Release blog is ready @jhmcintyre is looking into the automation for the same

Overall gaps in the current process:

  • Updating release page on website with release manager and release dates
  • Automatic status updates for above exit and entrance criteria.
  • Automatically merging version increment PRs.
  • BWC testings is not tracked and needs to fixed/on-boarded by multiple components.
  • Automate posting performance test results on the release issue.
  • Re-running flaky integration tests until they pass (keeping some kind of threshold for the number of runs).
  • Automatic way of notifying the developers or community members about the new RC build. This includes commenting on the GH issue, slack notifications (internal as well as external), forum posts if required.
  • Release Candidate needs to be validated. We are skipping this step as of now.
  • Access to run jenkins workflows by external community member.
  • Communication in public channels.
  • Infra flakiness (agent nodes and set up)
  • PGP Signature verification from website
  • Native plugin installation
  • Distribution smoke testing

Approach

os-release
uml gist

The above diagram represents an overview of the approach. By closing the gaps in the current process, the OpenSearch release process can be made more hands-free and efficient. At the high level, below changes can go in below phases:

Phase 1: Automate existing manual steps

This includes even the smallest step like creating a pull request to update release manager on the website. The current release documentation is very detailed but contains a lot of information that be easily missed. The steps involved in the release process are minor but important. By automating them, the risk of skipping those steps due to human error can be minimized.

Phase 2: Link the automations end-to-end

With smallest automation in place, we need an orchestrator to link these automations together. It would coordinate and manage various components or workflows to ensure that they work together in a structured and automated manner to achieve the goal. At the high level, this would include coordination between workflows/tasks, dependency management, error handling and recovery, monitoring and reporting, etc.

Phase 3: Access control

In order to be facilitate release process smoothly, the release manager needs to be able to view, run and debug the release workflows. This phase will take care of providing the fine grained access control to release specific workflows.

Conclusion

Given the comprehensive nature of the release process, which spans over two weeks, it isn't feasible to have a fully automated, one-click solution from start to finish for OpenSearch and OpenSearch Dashboards. The distribution release process has been always facilitated by the maintainers of this repo. The process can be enhanced by closing the missing gaps listed above (and more). Keeping in mind the OpenSearch’s move to the Linux Foundation, it should be possible for any LF member or an OpenSearch maintainer to release OpenSearch and OpenSearch-Dashboards in the future.

Next steps

  • Get feedback for the mentioned approach from the community and members of OpenSearch
  • Analyze if we have better solutions than the one mentioned above
  • Forge a detail implementation plan for each phase with estimates
@gaiksaya
Copy link
Member Author

gaiksaya commented Nov 7, 2024

Adding @getsaurabh02 @peterzhuamazon @prudhvigodithi @rishabh6788 @dblock @reta @andrross to get some input.
Thank you!

@dblock
Copy link
Member

dblock commented Nov 7, 2024

Thank you for the very thorough analysis of the release workflow!

I think we may be mixing two problems:

  1. The release can be managed by a non-Amazon, community member.
  2. The release process lacks automation.

Of course, if you do (2), then (1) becomes easier, but you have shown that the gap is fairly wide. Would it be possible to add the short list of what the must have's are for (1)? For example, "Access to run jenkins workflows by external community member." would be required, but "Sanity testing is done for all components" can continue being done the same way as today. In my opinion those must have's should be Phase 1, with a clear goal of having a community member manage an X.Y version release.

@reta
Copy link
Contributor

reta commented Nov 7, 2024

Thanks a lot for this very detailed description of the current release process. The suggestions to improve it are sound (and the subjects that @dblock brought in are super relevant). It definitely makes sense to me to start with automating the existing manual steps (at least we could start with OpenSearch Core / Dashboard), there are too many manual approvals (PRs, etc) at the moment which definitely contribute to the time and friction of the release.

@prudhvigodithi prudhvigodithi removed the untriaged Issues that have not yet been triaged label Nov 7, 2024
@getsaurabh02
Copy link
Member

Thanks @gaiksaya this is super deep analysis of the release workflow! Excited to see this coming soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New Enhancement release
Projects
Status: 🆕 New
Development

No branches or pull requests

5 participants