Keeping local (python) libraries in synch with those in runtime images #1647
Replies: 2 comments 3 replies
-
For the sake of re-creatability dynamically extended runtime images as you are proposing is probably something that should be avoided because it can lead different package versions across multiple runs. Static container images (are built once, registered as a runtime image and include all prerequisite libraries) are guaranteed to have the same (relevant) packages installed. Here's the potential problem with "dynamically extended" images. Let's say a "base" image (registered as a runtime image and only includes a few of the prerequisite libraries) is "extended" prior to execution using a list of requirements, say package A and package B. Each one of the packages has dependencies (Ax or Bx) of its own, which might not be pinned to a specific version. (Wether or not those packages are pinned to specific versions might be out of your control.) Running the image today, pip install might pull The only way I can think of to avoid this is to freeze/capture all package versions in the user-supplied process - which in essence yields the same results as a static image that has everything pre-installed but incurs the installation overhead every time a pipeline node is executed. |
Beta Was this translation helpful? Give feedback.
-
Hi folks.... One possibility could be that Elyra does a One issue wth this approach is that there could be lots of unused modules in the working environment which are not required to actually run the stage - but these would get picked up by Another alternative might be to have users do all their Either way, I don't think installing additional modules by the boot strapper should be the default setting - so as to minimise the number of cases which introduce unpredictability. It would be nice if it could be a checkbox that has to be checked for each stage which is going to do these "extended pip installs". That way, it's a a conscious decision on the part of the user. Any thoughts ? Cheers -- Simon |
Beta Was this translation helpful? Give feedback.
-
If elyra becomes heavily adopted within our org I can see a potential issue when it comes to trying to keep python libraries in sync between a developers local notebook environment, and those provided within elyra runtime images (for running on kubeflow).
In our org, runtime images will be provided by the data engineering team and will be relatively static as compared to a data scientists notebook environment that will evolve and change rapidly. So an issue could arise were a data scientists installs a particular version of a ML lib into their notebook kernel and the stages/pipelines run successfully in their local environment, but when submitted to kubeflow, the pipeline fails due to some differences between the libraries.
I thought at first that maybe just a simple
pip install
within the notebook would install the same libraries into the kubeflow pod. The command is indeed executed, but for the changes to take effect the kernel needs to be restarted! Is there a way we can use elyra topip install
libraries into a kubeflow pod, and for those changes to take effect immediately?Elyra installs its own dependencies (contained within https://raw.githubusercontent.com/elyra-ai/kfp-notebook/v0.23.0/etc/requirements-elyra.txt) into a kubeflow pod before the kernel starts using a bootstrap process (https://raw.githubusercontent.com/elyra-ai/kfp-notebook/v0.23.0/etc/docker-scripts/bootstrapper.py). Perhaps a user-supplied bootstrap process in which we could install user dependencies could be provided as a new feature ?
Many thanks -- Simon
Beta Was this translation helpful? Give feedback.
All reactions