Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend precompile to support a DAG #792

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

fibonacci1729
Copy link

@fibonacci1729 fibonacci1729 commented Jan 9, 2025

Still WIP while I address some lingering TODOs but wanted to get this up for 👀 before too long.

This PR effectively breaks the assumption that layers input to precompile map 1:1 with layers returned. This is necessary to support things like component dependencies in the spin shim via composition where there is not a clear mapping of original to precompiled layers.

I tried to evolve the precompile API in the most straightforward way without requiring shim developers to have to manage a DAG themselves.

NOTE: have to update the other shims to conform to the new API which will be straightforward but i'd like to get feedback on this approach here first.

cc/ @Mossaka @jsturtevant @kate-goldenring

@jsturtevant
Copy link
Contributor

/cc @jprendes as well who is doing some larger changes to the engine

linking a few issues for context:
#504
spinkube/containerd-shim-spin#238

@@ -62,7 +65,7 @@ pub trait Engine: Clone + Send + Sync + 'static {
/// The cached, precompiled layers will be reloaded on subsequent runs.
/// The runtime is expected to return the same number of layers passed in, if the layer cannot be precompiled it should return `None` for that layer.
/// In some edge cases it is possible that the layers may already be precompiled and None should be returned in this case.
fn precompile(&self, _layers: &[WasmLayer]) -> Result<Vec<Option<Vec<u8>>>> {
async fn precompile(&self, _layers: &[WasmLayer]) -> Result<Vec<PrecompiledLayer>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this now provides a way to do more than just pre-compiling such as composing I wonder if we would want to rename it

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that definitely seems reasonable! Happy to call this process_layers or whatever makes sense.

@jsturtevant
Copy link
Contributor

a quick look and it is about what I was expecting for changes. It was helpful to see how it was used in a non trivial example with spinkube/containerd-shim-spin#259

Copy link
Collaborator

@jprendes jprendes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some concerns around the GC references, and how we link layers with precompiled artifacts, but I'm no expect on any of that, so feel free to correct me.

I would eventually like to split the precompilation from the Engine, which would be a significant breaking change. If you think there's a more drastic change that could help with this work, I'd like to hear it :-)

@@ -62,7 +65,7 @@ pub trait Engine: Clone + Send + Sync + 'static {
/// The cached, precompiled layers will be reloaded on subsequent runs.
/// The runtime is expected to return the same number of layers passed in, if the layer cannot be precompiled it should return `None` for that layer.
/// In some edge cases it is possible that the layers may already be precompiled and None should be returned in this case.
fn precompile(&self, _layers: &[WasmLayer]) -> Result<Vec<Option<Vec<u8>>>> {
async fn precompile(&self, _layers: &[WasmLayer]) -> Result<Vec<PrecompiledLayer>> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return impl Future and remove the dependency on async_trait

Suggested change
async fn precompile(&self, _layers: &[WasmLayer]) -> Result<Vec<PrecompiledLayer>> {
fn precompile(&self, _layers: &[WasmLayer]) -> impl Future<Output = Result<Vec<PrecompiledLayer>>> + Send { async move {

}

let layers = all_layers.values().cloned().collect::<Vec<_>>();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let layers = all_layers.values().cloned().collect::<Vec<_>>();
let layers = all_layers.into_values().collect::<Vec<_>>();


let gc_label =
format!("containerd.io/gc.ref.content.precompile.{child_digest}");
parent_layer.labels.insert(gc_label, child_digest.clone());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, this means that as long as at least one parent is present, the precompilation won't be GCd.
It seems to be that should be that as soon as one parent is GCd, the precompilation should be GCd as well.

Could this lead to the situation where we have a very popular layer that parents many components (e.g., a virtual FS layer), and prevents all the precompilations from being GCd?

@jsturtevant, you know more about this than me :-)

self.update_info(original_layer).await?;
// Update the original layers with a gc label which associates the original digests that
// were used to process and produce the new layer with the digest of the precompiled content.
for parent_idx in parents {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the future: As an optimization, I guess all of these could be done in parallel.

.labels
.into_iter()
.filter_map(|(key, child_digest)| {
if key.starts_with(&format!("{precompile_id}/child")) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, this label gets added to all parents of a precompiled layer.
If a popular layer is shared by many images (e.g., a layer for a virtual FS) and parents many precompilations, will this result in the precompiled artifact for all those images to be loaded here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants