EOEPCA+ Infrastructure Deployment Guide should focus on pre-requisites #21

spinto · 2024-11-12T16:08:09Z

I appreciate the effort to put information on how to create a cluster satisfying EOEPCA+ needs, but I think what is in the Infrastructure Deployment guide is a bit too much and risks to divert the attention of people from the peculiarities of EOEPCA.

So my fear is that, one, it will be hard and a bit pointless to maintain a guide on how to install kubernetes and setup a k8s cluster when there are several on the internet (which we can point, like the rancher k8s non-production environment installation), two people may just skip that, and assume they have enough with their kubernetes cluster, while actually there are some peculiarities of EOEPCA like the need to run containers as root, the readwritemany storage, the specific storageclass names for persistance, which may get lost.

So my proposal qould be to rename the "Infrastructure Deployment" to "EOEPCA pre-requisite". Have there the ollowing sessions:

Kubernetes, where we can still point on how to install a Kubernetes (e.g. the Rancher distribution), but moslty explain what we require/recommend from the Kubernetes installation (execution of containers as root (required), an ingress/wildcard dns (required), a load balancer with internet incoming 80/443 ports (recommended), a cert-manager (recommended), etc...).
EBS, where we explain that we strongly recommend block storage to be attached to the K8S containers, so a readwritemany storageclass provisioner for persistence, and this is a requirement for some of the BBs like the CWL processing engine. This can be provided with NFS, OpenEBS or Longhorn for example.
Object Storage, where we explain we need an S3, hypothetically any S3, if you do not have one you can deploy one on K8S

Plus, the check-prequisite script should be more "invasive" and run some tests in the cluster, e.g. running a pod as root, starting a pod service with an ingress and checking if the pod is accessible, checking if the certificate for that pod is correct, etc...

spinto · 2024-11-14T13:10:31Z

as a note from the discussions in #23 and #14 , in the pre-requisite page we should consider putting info about what is recommended for production and what is recommended for development. This is valid for all the 3 areas, the K8S cluster, the EBS storage and the Object Storage.

I would imagine that, for the K8S cluster, for production we would recommend an external IP-address, certmanager with letsencrypt and Rancher (production install), while for development/internal testing/demos we would recommend Rancher (single node install) and the manual TLS.

For the EBS, I have runned several solutions in the past, and in production IBM Spectrum Scale (proprietary) and GlusterFS (open source) works quite well, while for development Longhorn and OpenEBS are supposed to be much simpler to setup.

For Object Storage also, the EOEPCA minio helm chart is good for development/testin/demo, but probably a standalone Minio installation or something like the EMC Object Storage solution (or Amazon S3) is a better option

jdries · 2024-11-14T13:34:02Z

So you are saying, all operational platforms should operate a GlusterFS or else a proprietary solution right? (Unless, if rwx volumes are offered by cloud provider?)

spinto · 2024-11-14T14:35:35Z

No, I am not saying that. I am saying that there are several solutions which are proven to be operationally-ready, GlusterFS is one of them, but there are others. OpenEBS may be one of them, I did not use it personally but Fabrice was telling that it is used in operations in different platforms.

jdries · 2024-11-15T12:52:10Z

Thanks for all the explanation, it's already helpful!
Anyway, the main concern for operational platforms is to get an idea of what operational cost will be, and how complex it is to run something like that on an autoscaling cluster and cloud environment where VM's are ephemeral. From my own experience in running a data storage cluster, it does require significantly more experience & work, but perhaps something modern like OpenEBS solves that. (Even though I hear that cloud providers themselves are also struggling or have struggled with providing rwx volumes.)
The other interesting option to explore is cwl runners that in fact avoid the shared storage requirement, but again, I would hope that this has all been researched in the past.

spinto · 2024-11-15T13:16:05Z

About the cost/complexity vs advantages, I think it mostly depends on which kind of applications you want to support. CWL is mostly used in HTC/HPC , so it feels "natural" that a CWL runner would assume or be configured by default with a shared storage across your nodes... but CWL, even if born in HPC, it is just a workflow and does not require per-se distributed storage.

And yes, this was explored in the past, CWL (or OGC AppPackage, BTW) does not mean calrissian. That is what we have in one of the EOEPCA processing BB "engine", but we have already Toil as a CWL runner for another "engine", and Toil for example should not require a ReadWriteMany if configured with HTCondor as scheduler. Also, for OpenEO UDF, as the use case is not really HTC, you could just have a simple execution via cwltool . We can chat more about what it is best.

NOTE: we are digressing outside the scope of this ticket, for that, as I said before, what we need to ensure is that the documentation is clear also in addressing that the OpenEBS or other ReadWriteMany solutions is required only by some of the EOEPCA BBs (and we should specify which ones)

spinto changed the title ~~EOEPCA+ Infrastructure Deployment Guide should be renamed more to pre-requisites~~ EOEPCA+ Infrastructure Deployment Guide should be renamed to pre-requisites Nov 13, 2024

spinto changed the title ~~EOEPCA+ Infrastructure Deployment Guide should be renamed to pre-requisites~~ EOEPCA+ Infrastructure Deployment Guide should focus on pre-requisites Nov 13, 2024

james-hinton self-assigned this Nov 13, 2024

rconway mentioned this issue Nov 14, 2024

Investigate possible use of OpenEBS #23

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EOEPCA+ Infrastructure Deployment Guide should focus on pre-requisites #21

EOEPCA+ Infrastructure Deployment Guide should focus on pre-requisites #21

spinto commented Nov 12, 2024 •

edited

Loading

spinto commented Nov 14, 2024

jdries commented Nov 14, 2024

spinto commented Nov 14, 2024

jdries commented Nov 15, 2024

spinto commented Nov 15, 2024 •

edited

Loading

EOEPCA+ Infrastructure Deployment Guide should focus on pre-requisites #21

EOEPCA+ Infrastructure Deployment Guide should focus on pre-requisites #21

Comments

spinto commented Nov 12, 2024 • edited Loading

spinto commented Nov 14, 2024

jdries commented Nov 14, 2024

spinto commented Nov 14, 2024

jdries commented Nov 15, 2024

spinto commented Nov 15, 2024 • edited Loading

spinto commented Nov 12, 2024 •

edited

Loading

spinto commented Nov 15, 2024 •

edited

Loading