Extend precompile to support a DAG #792

fibonacci1729 · 2025-01-09T00:05:40Z

Still WIP while I address some lingering TODOs but wanted to get this up for 👀 before too long.

This PR effectively breaks the assumption that layers input to precompile map 1:1 with layers returned. This is necessary to support things like component dependencies in the spin shim via composition where there is not a clear mapping of original to precompiled layers.

I tried to evolve the precompile API in the most straightforward way without requiring shim developers to have to manage a DAG themselves.

NOTE: have to update the other shims to conform to the new API which will be straightforward but i'd like to get feedback on this approach here first.

cc/ @Mossaka @jsturtevant @kate-goldenring

Signed-off-by: Brian H <[email protected]>

jsturtevant · 2025-01-09T19:13:26Z

/cc @jprendes as well who is doing some larger changes to the engine

linking a few issues for context:
#504
spinkube/containerd-shim-spin#238

jsturtevant · 2025-01-09T19:20:20Z

crates/containerd-shim-wasm/src/container/engine.rs

@@ -62,7 +65,7 @@ pub trait Engine: Clone + Send + Sync + 'static {
    /// The cached, precompiled layers will be reloaded on subsequent runs.
    /// The runtime is expected to return the same number of layers passed in, if the layer cannot be precompiled it should return `None` for that layer.
    /// In some edge cases it is possible that the layers may already be precompiled and None should be returned in this case.
-    fn precompile(&self, _layers: &[WasmLayer]) -> Result<Vec<Option<Vec<u8>>>> {
+    async fn precompile(&self, _layers: &[WasmLayer]) -> Result<Vec<PrecompiledLayer>> {


Since this now provides a way to do more than just pre-compiling such as composing I wonder if we would want to rename it

Yeah that definitely seems reasonable! Happy to call this process_layers or whatever makes sense.

jsturtevant · 2025-01-09T19:34:33Z

a quick look and it is about what I was expecting for changes. It was helpful to see how it was used in a non trivial example with spinkube/containerd-shim-spin#259

jprendes

I have some concerns around the GC references, and how we link layers with precompiled artifacts, but I'm no expect on any of that, so feel free to correct me.

I would eventually like to split the precompilation from the Engine, which would be a significant breaking change. If you think there's a more drastic change that could help with this work, I'd like to hear it :-)

jprendes · 2025-01-13T16:08:11Z

crates/containerd-shim-wasm/src/container/engine.rs

@@ -62,7 +65,7 @@ pub trait Engine: Clone + Send + Sync + 'static {
    /// The cached, precompiled layers will be reloaded on subsequent runs.
    /// The runtime is expected to return the same number of layers passed in, if the layer cannot be precompiled it should return `None` for that layer.
    /// In some edge cases it is possible that the layers may already be precompiled and None should be returned in this case.
-    fn precompile(&self, _layers: &[WasmLayer]) -> Result<Vec<Option<Vec<u8>>>> {
+    async fn precompile(&self, _layers: &[WasmLayer]) -> Result<Vec<PrecompiledLayer>> {


Return impl Future and remove the dependency on async_trait

Suggested change

async fn precompile(&self, _layers: &[WasmLayer]) -> Result<Vec<PrecompiledLayer>> {

fn precompile(&self, _layers: &[WasmLayer]) -> impl Future<Output = Result<Vec<PrecompiledLayer>>> + Send { async move {

jprendes · 2025-01-13T16:20:23Z

crates/containerd-shim-wasm/src/sandbox/containerd/client.rs

        }

+        let layers = all_layers.values().cloned().collect::<Vec<_>>();


Suggested change

let layers = all_layers.values().cloned().collect::<Vec<_>>();

let layers = all_layers.into_values().collect::<Vec<_>>();

jprendes · 2025-01-13T16:50:15Z

crates/containerd-shim-wasm/src/sandbox/containerd/client.rs

+
+                    let gc_label =
+                        format!("containerd.io/gc.ref.content.precompile.{child_digest}");
+                    parent_layer.labels.insert(gc_label, child_digest.clone());


IIUC, this means that as long as at least one parent is present, the precompilation won't be GCd.
It seems to be that should be that as soon as one parent is GCd, the precompilation should be GCd as well.

Could this lead to the situation where we have a very popular layer that parents many components (e.g., a virtual FS layer), and prevents all the precompilations from being GCd?

@jsturtevant, you know more about this than me :-)

jprendes · 2025-01-13T16:51:24Z

crates/containerd-shim-wasm/src/sandbox/containerd/client.rs

-                self.update_info(original_layer).await?;
+                // Update the original layers with a gc label which associates the original digests that
+                // were used to process and produce the new layer with the digest of the precompiled content.
+                for parent_idx in parents {


for the future: As an optimization, I guess all of these could be done in parallel.

jprendes · 2025-01-13T22:51:49Z

crates/containerd-shim-wasm/src/sandbox/containerd/client.rs

+                .labels
+                .into_iter()
+                .filter_map(|(key, child_digest)| {
+                    if key.starts_with(&format!("{precompile_id}/child")) {


IIUC, this label gets added to all parents of a precompiled layer.
If a popular layer is shared by many images (e.g., a layer for a virtual FS) and parents many precompilations, will this result in the precompiled artifact for all those images to be loaded here?

Extend precompile to support a DAG

e551e6e

Signed-off-by: Brian H <[email protected]>

fibonacci1729 mentioned this pull request Jan 9, 2025

feat: Add support for component dependencies spinkube/containerd-shim-spin#259

Draft

jsturtevant reviewed Jan 9, 2025

View reviewed changes

jprendes reviewed Jan 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend precompile to support a DAG #792

Extend precompile to support a DAG #792

fibonacci1729 commented Jan 9, 2025 •

edited

Loading

jsturtevant commented Jan 9, 2025

jsturtevant Jan 9, 2025

fibonacci1729 Jan 9, 2025

jsturtevant commented Jan 9, 2025

jprendes left a comment

jprendes Jan 13, 2025

jprendes Jan 13, 2025

jprendes Jan 13, 2025

jprendes Jan 13, 2025

jprendes Jan 13, 2025

	async fn precompile(&self, _layers: &[WasmLayer]) -> Result<Vec<PrecompiledLayer>> {
	fn precompile(&self, _layers: &[WasmLayer]) -> impl Future<Output = Result<Vec<PrecompiledLayer>>> + Send { async move {

		}

		let layers = all_layers.values().cloned().collect::<Vec<_>>();

	let layers = all_layers.values().cloned().collect::<Vec<_>>();
	let layers = all_layers.into_values().collect::<Vec<_>>();

Extend precompile to support a DAG #792

Are you sure you want to change the base?

Extend precompile to support a DAG #792

Conversation

fibonacci1729 commented Jan 9, 2025 • edited Loading

jsturtevant commented Jan 9, 2025

jsturtevant Jan 9, 2025

Choose a reason for hiding this comment

fibonacci1729 Jan 9, 2025

Choose a reason for hiding this comment

jsturtevant commented Jan 9, 2025

jprendes left a comment

Choose a reason for hiding this comment

jprendes Jan 13, 2025

Choose a reason for hiding this comment

jprendes Jan 13, 2025

Choose a reason for hiding this comment

jprendes Jan 13, 2025

Choose a reason for hiding this comment

jprendes Jan 13, 2025

Choose a reason for hiding this comment

jprendes Jan 13, 2025

Choose a reason for hiding this comment

fibonacci1729 commented Jan 9, 2025 •

edited

Loading