-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
As a user I can pull-through cache container images when remote is defined on distribution #507
Comments
In pulp_python, we are streaming artifacts from a remote repository. The artifacts are not attached to any repository and are orphans. The pull-through cache is content marked as an orphan. An orphan clean-up task will remove all of the "cached" content. Proposal 1In pulp_container, we have more than one content type to preserve in the "cache". I propose to extend the concept introduced in pulp_python to "cached" repositories. It allows us to better track content pulled from a remote. As a result, we will have two types of repositories referenced from a distribution:
The Registry serves the content from either the original repository or temporary repository, based on the presence of the content. If the content is not present either in the original repository or the temporary repository, Pulp pulls the missing content (tags, manifests, blobs) from a remote specified in a distribution. Here, we will utilize the workflows implemented in the sync pipeline when downloading the content from a remote (a new async task will be triggered). A PoC is shown in #704. The extended workflow (15 additional lines) for removing temporary repositories via the orphan clean-up machinery needs to be added to pulpcore. The whole concept applies to a distribution that handles pull-through caching only for a specific upstream repository:
Side-note
Proposal 2The second idea is to make pull-through caching work as a standalone entity where an administrator creates a special type of "distribution". The "distribution" will hold the reference to a remote repository. Content requested by a user will be automatically downloaded (and thus cached) to Pulp on demand. It will work for all repositories hosted on a remote Registry (the option Such an approach will result in having a couple of temporary repositories referenced by a distribution or a couple of distribution-repository pairs that will be created from the special "distribution". The latter can benefit from the implementation of Proposal 1 (if the caching will be enabled for a sub-distribution). Side-noteAgain, one of the problems is the time required for downloading content from a remote (downloading large artifacts/blobs to Pulp and then forwarding it to a user will take some time (on_demand downloading + querying existing content units (as we do in a standard sync pipeline) + creating new repositories/namespaces/distributions on the fly)). This might not be tolerated by container CLI clients due to pre-defined timeouts. Permissions HandlingTo be decided. Any suggestions or ideas on how to improve or adjust the logic? |
In Proposal 2, we will enable a user to download/forward/stream content from a remote registry considering no matter which repository she is talking to. Considering that, users could (intentionally) flood Pulp and make it unavailable for some time. It might be convenient to take a look at whitelisting upstreams, as proposed in #459. |
We will revisit this once it becomes a higher priority |
This would be awesome to easily prevent dockerhub rate-limits. Is there anything the community could do to code/test/sponsor this? |
@benedikt-bartscher thank you for showing your interest in this feature. There is already a mechanism in place in pulp-container to prevent dockerhub rate-limits. It would consist in creating and mirroring external repo locally with on_demand policy beforehand of client pull( it does not download blobs, just manifests). Pulp would download all the blob data on the first client request from remote source , however all subsequent requests would be served directly from pulp. |
Hey @ipanova thanks for you reply. I know about that mechanism, i am currently using it. However it's not very convenient to setup every repo manually, thats why i pushed this issue. |
@benedikt-bartscher gotcha, yeah that's the inconvenience compared to the pull-through cache. We will try to resume the work on this, there are some design challenges we need to wrap the head around ;) |
Braindump of an idea: |
Summary of today's meeting. There will be outlined few phases where the first one will follow the KISS rule and further ones will gradually add improvements on a per-need basis. Phase 1:
|
Not sure if the repository needs to know, it is used in a pull-through manner. We will always be accessing it from the PullThroughDistribution. We probably do not want to create yet another repository type. |
It won't be another repo type. Mirror=true describes the behavior of how repo stores content, it won't be additive but mirror. This options is usually passed in the sync task https://github.com/pulp/pulpcore/blob/main/pulpcore/plugin/stages/declarative_version.py#L21 we however will need to adopt similar logic in the pull-through workflow so we do not store unnecessary content which is no longer available on the remote source. |
Author: @ipanova ([email protected])
Redmine Issue: 9560, https://pulp.plan.io/issues/9560
see pulp_python PR for more details https://github.com/pulp/pulp_python/pull/384/files
pulp_container won't have this as easy because of multiple content types needed to be downloaded as well as relations created between them
The text was updated successfully, but these errors were encountered: