Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Native support for sidecar containers #1644

Open
fnikolai opened this issue May 3, 2023 · 4 comments
Open

Native support for sidecar containers #1644

fnikolai opened this issue May 3, 2023 · 4 comments
Labels
enhancement New feature or request

Comments

@fnikolai
Copy link
Contributor

fnikolai commented May 3, 2023

Is your feature request related to a problem? Please describe.
Many scientific applications/workflows are currently written for the Kubernetes ecosystem. Running these Kubernetes-applications on HPC infrastructure is a highly desirable features that currently remains unsolved. To this end, I 'm working on a project for interfacing Kubernetes with Slurm, and Singularity is the enabler for this functionality.

However, many applications depend on the sidecar pattern (e.g., for monitoring, for orchestrating execution, etc).

Describe the solution you'd like
Native support for the sidecar pattern.

Regarding the user interface, I think changing the instance start command is the best way to proceed. I would expect something like:

singularity instance start [start options...] **<main container path> <sidecar container path>** ... <instance name> [startscript args...]

And then, the sidecar should be accessible by usual "exec", "shell", "run" commands.

singularity shell instance://<instance name>:<container index or ID>

In the same context, the logic for init containers should also be implemented.

Describe alternatives you've considered
My current implementation is based on a combination of nested containers that run on the namespace of a "parent" container.

The script implements both init containers and sidecar containers required by Kubernetes.

Additional context
I 'm aware of singularity-cri, but this does not work in my case.
I need a "serverless" solution that does not require daemons running on the compute nodes.

@fnikolai fnikolai added the enhancement New feature or request label May 3, 2023
@dtrudg
Copy link
Member

dtrudg commented May 4, 2023

Many scientific applications/workflows are currently written for the Kubernetes ecosystem. Running these Kubernetes-applications on HPC infrastructure is a highly desirable features that currently remains unsolved.

Could you give some specific examples of applications that fit this statement, and need sidecar/init containers? As a significant and complex feature that falls outside of the main focus of 4.x (native OCI runtime execution), we'd be looking for a broad need among the community of users.

My initial reaction is also to ask why e.g. podman, which has an an aim to specifically support this type of thing via pods, is not an option? That may reveal technical limitations in your environment that would impact this idea.

@fnikolai
Copy link
Contributor Author

fnikolai commented May 4, 2023

Could you give some specific examples of applications that fit this statement, and need sidecar/init containers? As a significant and complex feature that falls outside of the main focus of 4.x (native OCI runtime execution), we'd be looking for a broad need among the community of users.

One specific case is Argo (https://argoproj.github.io/argo-workflows/). It uses init containers for some initialization, and sidecar containers for monitoring the execution of the main container.

Other case is Prometheus/cadvisor. The advisor runs as sidecar to an application for collecting resource utilization metrics.

My initial reaction is also to ask why e.g. podman, which has an an aim to specifically support this type of thing via pods, is not an option? That may reveal technical limitations in your environment that would impact this idea.

Podman is indeed a nice fit, but my main constraints are:

  1. Need private container network with routable IP addresses
  2. Need almost native network speed
  3. The containers should be run as rootless

According to this doc, the rootless network stack is based on slirp4netns, which does not give routable IP addresses, and whose performance is awful.

https://github.com/containers/podman/blob/main/docs/tutorials/basic_networking.md#basic-network-setups

Plus, my architecture is based on parallel filesystems (i.e Lustre), and based on what I read podman does not work well with that (due to the lack of xattr from parallel filesystems)

@dtrudg
Copy link
Member

dtrudg commented May 4, 2023

Thanks for the details.

Could you give some specific examples of applications that fit this statement, and need sidecar/init containers? As a significant and complex feature that falls outside of the main focus of 4.x (native OCI runtime execution), we'd be looking for a broad need among the community of users.

One specific case is Argo (https://argoproj.github.io/argo-workflows/). It uses init containers for some initialization, and sidecar containers for monitoring the execution of the main container.

Are there existing important workflows that require Argo, or do you want to start developing for Argo?

I'm trying to get to the underlying use-case here, to understand what exactly this is a blocker for. Given Argo is explicitly "The workflow engine for Kubernetes", what's the reason to use it over a more HPC focused/friendly workflow system?

Thanks for indulging me... I don't have a lot of background knowledge on k8s for workflows.

@fnikolai
Copy link
Contributor Author

fnikolai commented May 4, 2023

Argo's engine is based on the sidecar pattern -- so sidecar support is needed even for the simplest workflows.

An Argo workflow consists of steps. Each step is a pod that includes two containers. The main (application) container does that job, and a sidecar (instrumentation) container watches the status of the main container and interacts with the Argo controller.

That said, Argo is now the standard workflow engine over which several scientific applications are built (I have genome and data analytics in mind). For example,
https://www.alibabacloud.com/blog/introduction-to-kubernetes-native-workflows-based-on-the-argo-project-newly-hosted-by-cncf_596591

The case is two-fold.

  1. To be able to run the existing (scientific) workflows at the scale of an HPC infrastructure.

  2. We are now trying to build Argo workflows for the post-analysis of simulation datasets. The reason to do so is that we find Argo more "mature" and "general-purpose" than other workflow engines for HPC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants