-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Auto import images for embedded registry #10973
base: master
Are you sure you want to change the base?
[WIP] Auto import images for embedded registry #10973
Conversation
## Context | ||
|
||
Since the feature for embedded registry, the users appeared with a question about having to manually import images, specially in edge environments | ||
As a result, there is a need for a folder to be created, where every image there will be watched by a controller (a child process that will run when the embedded registry is created) for changes or new images, this new images or new changes will be added to the node registry, meaning that other nodes will have access to the image. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'll be much better if we could reuse the existing folder for images instead of creating a new one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could possibly be convinced to just change the behavior of the existing agent/images folder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking into creating the controller for agent/images and the controller will upload the images as the default behavior, but for others images there the controller can add separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users can now add images in the embedded registry auto import folder while k3s is running
To avoid confusion, remove any references to importing things into the embedded registry - the embedded registry does not have a discrete image store to import into. We import images into the containerd image store, and images in the containerd image store can be used by Kubernetes pods without needing to be pulled from a remote registry, and they can also be shared between nodes by the embedded registry.
|
||
Since the feature for embedded registry, the users appeared with a question about having to manually import images, specially in edge environments. | ||
|
||
As a result, there is a need for a folder who can handle this action, where every image there will be watched by a controller (a child process that will run when the embedded registry is created) for changes or new images, this new images or new changes will be added to the containerd node registry, meaning that other nodes will have access to the image. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Controllers are not necessarily child processes, they are usually a goroutine within the main k3s supervisor process. Can we remove the expectation that this will be handled by a dedicated child process?
- Instead of "containerd node registry", refer to it as the "containerd image store" as that is the correct name for where images are imported into.
- Remove any mention of sharing access with other nodes. That is a function of the embedded registry and not related to the auto-import functionality discussed here.
|
||
As a result, there is a need for a folder who can handle this action, where every image there will be watched by a controller (a child process that will run when the embedded registry is created) for changes or new images, this new images or new changes will be added to the containerd node registry, meaning that other nodes will have access to the image. | ||
|
||
This folder could it be the agent/images itself, but it will need a change in the way the images were loaded previously, this could it be the new controller handling the first load for k3s images. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Propose something specific. If we do decide to periodically re-scan the existing images folder, how would we do that? A periodic scan would be easiest, but inotify and other APIs for realtime change detection are available. Processing image tarballs can be an io-intensive operation, how should we determine if the whole file should be imported without having to reprocess the entire contents - compare mtime+size perhaps?
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #10973 +/- ##
==========================================
- Coverage 49.93% 43.66% -6.28%
==========================================
Files 178 178
Lines 14816 14907 +91
==========================================
- Hits 7399 6509 -890
- Misses 6069 7182 +1113
+ Partials 1348 1216 -132
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Signed-off-by: Vitor Savian <[email protected]>
Signed-off-by: Vitor Savian <[email protected]>
…e a new one Signed-off-by: Vitor Savian <[email protected]>
Signed-off-by: Vitor Savian <[email protected]>
Signed-off-by: Vitor Savian <[email protected]>
Signed-off-by: Vitor Savian <[email protected]>
Signed-off-by: Vitor Savian <[email protected]>
Signed-off-by: Vitor Savian <[email protected]>
88b30cb
to
d605c59
Compare
Signed-off-by: Vitor Savian <[email protected]>
Signed-off-by: Vitor Savian <[email protected]>
pkg/agent/containerd/containerd.go
Outdated
} | ||
|
||
switch event.Op { | ||
case fsnotify.Write: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably recommend against doing the work of importing image directly in the fsnotify event loop. Depending on how the file is written, you may get multiple events for the same file in short sequence. I would probably just wire the notify events up to a keyed work queue, to ensure that the fsnotify event queue is not blocked, and to merge multiple events concerning the same file.
You can take the service change code as a reference:
k3s/pkg/cloudprovider/servicelb.go
Lines 177 to 227 in 430a7dc
// runWorker dequeues Service changes from the work queue // We run a lightweight work queue to handle service updates. We don't need the full overhead // of a wrangler service controller and shared informer cache, but we do want to run changes // through a keyed queue to reduce thrashing when pods are updated. Much of this is cribbed from // https://github.com/rancher/lasso/blob/release/v2.5/pkg/controller/controller.go#L173-L215 func (k *k3s) runWorker() { for k.processNextWorkItem() { } } // processNextWorkItem does work for a single item in the queue, // returning a boolean that indicates if the queue should continue // to be serviced. func (k *k3s) processNextWorkItem() bool { obj, shutdown := k.workqueue.Get() if shutdown { return false } if err := k.processSingleItem(obj); err != nil && !apierrors.IsConflict(err) { logrus.Errorf("%s: %v", controllerName, err) } return true } // processSingleItem processes a single item from the work queue, // requeueing it if the handler fails. func (k *k3s) processSingleItem(obj interface{}) error { var ( key string ok bool ) defer k.workqueue.Done(obj) if key, ok = obj.(string); !ok { logrus.Errorf("expected string in workqueue but got %#v", obj) k.workqueue.Forget(obj) return nil } keyParts := strings.SplitN(key, "/", 2) if err := k.updateStatus(keyParts[0], keyParts[1]); err != nil { k.workqueue.AddRateLimited(key) return fmt.Errorf("error updating LoadBalancer Status for %s: %v, requeueing", key, err) } k.workqueue.Forget(obj) return nil } k3s/pkg/cloudprovider/cloudprovider.go
Line 50 in 430a7dc
workqueue workqueue.RateLimitingInterface k3s/pkg/cloudprovider/cloudprovider.go
Line 107 in 430a7dc
k.workqueue = workqueue.NewRateLimitingQueue(workqueue.DefaultControllerRateLimiter())
pkg/agent/containerd/containerd.go
Outdated
// At startup all leases from k3s are cleared; we no longer use leases to lock content | ||
if err := clearLeases(ctx, client); err != nil { | ||
logrus.Errorf("Error while clearing leases: %s", err.Error()) | ||
return | ||
} | ||
|
||
// Clear the pinned labels on all images previously pinned by k3s | ||
if err := clearLabels(ctx, client); err != nil { | ||
logrus.Errorf("Error while clearing labes: %s", err.Error()) | ||
return | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already done in PreloadImages
before go Watcher(ctx, cfg)
is called, don't do it again here. Doing so would un-label all the images that we just finished importing.
pkg/agent/containerd/containerd.go
Outdated
// Watcher is a controller that watch the agent/images folder | ||
// to ensure that every new file is added to the watcher state | ||
func Watcher(ctx context.Context, cfg *config.Node) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't appear to be called outside this package, why does it need to be exported?
// Watcher is a controller that watch the agent/images folder | |
// to ensure that every new file is added to the watcher state | |
func Watcher(ctx context.Context, cfg *config.Node) { | |
// watcher is a controller that uses fsnotify to watch the agent/images folder | |
// and import any tarballs placed in that directory whenever they are created or modified. | |
func watcher(ctx context.Context, cfg *config.Node) { |
pkg/agent/containerd/containerd.go
Outdated
// get the file info to add to the state map | ||
fileInfo, err := dirEntry.Info() | ||
if err != nil { | ||
logrus.Errorf("Error while getting the info from file: %s", err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Convention is to use %v for errors, and just pass in the error itself.
logrus.Errorf("Error while getting the info from file: %s", err.Error()) | |
logrus.Errorf("Error while getting the info from file: %v", err) |
pkg/agent/containerd/containerd.go
Outdated
case fsnotify.Write: | ||
newStateFile, err := os.Stat(event.Name) | ||
if err != nil { | ||
logrus.Errorf("Error encountered while getting file %s info for event write: %s", event.Name, err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logrus.Errorf("Error encountered while getting file %s info for event write: %s", event.Name, err.Error()) | |
logrus.Errorf("Error encountered while getting file %s info for event write: %v", event.Name, err) |
pkg/agent/containerd/containerd.go
Outdated
case fsnotify.Create: | ||
info, err := os.Stat(event.Name) | ||
if err != nil { | ||
logrus.Errorf("Error encountered while getting file %s info for event Create: %s", event.Name, err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logrus.Errorf("Error encountered while getting file %s info for event Create: %s", event.Name, err.Error()) | |
logrus.Errorf("Error encountered while getting file %s info for event Create: %v", event.Name, err) |
pkg/agent/containerd/containerd.go
Outdated
if !ok { | ||
return | ||
} | ||
logrus.Errorf("error in watcher controller: %s", err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
%v
pkg/agent/containerd/containerd.go
Outdated
delete(stateFileInfos, event.Name) | ||
logrus.Infof("Removed file from the watcher controller: %s", event.Name) | ||
case fsnotify.Remove: | ||
delete(stateFileInfos, event.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to un-label the images that came from this file when it is deleted? We'd probably need to use another label to track what file (or files, in case it is in multiple) an image came from to do this.
pkg/agent/containerd/containerd.go
Outdated
delete(stateFileInfos, event.Name) | ||
logrus.Infof("Removed file from the watcher controller: %s", event.Name) | ||
} | ||
case err, ok := <-watcher.Errors: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if there is a not-ok error here? Do we just return from the goroutine and stop watching forever, until the next time K3s is restarted? Should this be a fatal error so that k3s is restarted, or should we restart the watch again?
pkg/agent/containerd/containerd.go
Outdated
logrus.Errorf("Error to create a watcher: %s", err.Error()) | ||
return | ||
} | ||
|
||
// Add agent/images path to the watcher. | ||
err = watcher.Add(cfg.Images) | ||
if err != nil { | ||
logrus.Errorf("Error when creating the watcher controller: %s", err.Error()) | ||
return | ||
} | ||
|
||
client, err := Client(cfg.Containerd.Address) | ||
if err != nil { | ||
logrus.Errorf("Error to create containerd client: %s", err.Error()) | ||
return | ||
} | ||
|
||
criConn, err := cri.Connection(ctx, cfg.Containerd.Address) | ||
if err != nil { | ||
logrus.Errorf("Error to create CRI connection: %s", err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error handling here is to just print a message and return, leaving K3s with no watch on the images. Should we have this instead return an error, and wrap it in a retry loop so that it is restarted if it fails?
Also note that the idiomatic error message should be in the rough form of:
logrus.Errorf("Failed to SOMETHING with %s: %v", thing.Name, err)
"Error to" is not proper grammar, and we already know it's an error because we are using Errorf
which prints level=error
as part of the log entry.
Signed-off-by: Vitor Savian <[email protected]>
Signed-off-by: Vitor Savian <[email protected]>
We probably shouldn't do it in this PR, but it would be nice if whatever we used for watching image file changes was generic, and could be reused by the deploy controller - right now it just re-scans everything once a minute: Lines 85 to 100 in 8ce04d3
|
Proposed Changes
Types of Changes
Verification
Testing
Linked Issues
User-Facing Change
Further Comments