Add manifest arch, os, and compressed layers size fields #1782

git-hyagi · 2024-09-25T11:24:46Z

closes: #1767

lubosmj

A couple of remarks:

If we need to store the "architecture" and "os" fields on the Manifest model, we should try to fetch the values from a manifest list first. An OCI Index has the "architecture" and "os" fields required. There is no need to read the data from a config blob if a manifest is listed within an index (https://github.com/opencontainers/image-spec/blob/main/image-index.md#image-index-property-descriptions).
Currently, there are three separate migrations. I would like to squash them. Every migration takes time to run some boilerplate which is transparent to us.
Currently, there are three commits referencing the same issue. I would like to squash them. Such a separation is not necessary.
If we decided to create another django-admin command, we should align with Katello to see if it makes sense for them to run a management command reading data from the storage.

lubosmj · 2024-09-26T07:31:35Z

CHANGES/1767.feature

+The Manifest model has been enhanced with a new:
+    * `architecture` field, which specifies the CPU architecture for which the binaries in the
+    image are designed to run.
+    * `os` field, which specifies the operating system which the image is built to run on.
+    * `compressed_layers_size` field, which specifies the sum of the sizes of all compressed layers.


This is quite wordy. We should stick to shorter changelog messages (see https://pulpproject.org/pulpcore/changes/ for reference). It is not necessary to describe every manifest field in detail.

pulp_container/app/management/commands/container-repair-manifest-metadatas.py

lubosmj · 2024-09-26T08:17:00Z

pulp_container/app/models.py

@@ -103,6 +108,9 @@ class Manifest(Content):

    annotations = models.JSONField(default=dict)
    labels = models.JSONField(default=dict)
+    architecture = models.TextField(null=True)
+    os = models.TextField(null=True)
+    compressed_layers_size = models.TextField(null=True)


Is it better to use IntegerField here?

ianballou · 2024-09-30T20:05:48Z

4. If we decided to create another django-admin command, we should align with Katello to see if it makes sense for them to run a management command reading data from the storage.

From the Katello side, we're okay with the command being the same as before or a new one. Choose whatever is most efficient from the Pulp side.

ianballou · 2024-09-30T20:17:12Z

Also I wanted to clarify to be super sure -- OCI Image Index manifests will always have a null arch, OS, and size? I'm not super clear since, according to https://specs.opencontainers.org/image-spec/image-index/?v=v1.0.1, the optional platform property can have an arch and an OS,

git-hyagi · 2024-10-01T12:55:40Z

Also I wanted to clarify to be super sure -- OCI Image Index manifests will always have a null arch, OS, and size? I'm not super clear since, according to https://specs.opencontainers.org/image-spec/image-index/?v=v1.0.1, the optional platform property can have an arch and an OS,

Thank you for bringing this!
We reviewed and discussed the os and architecture fields from manifest list, and we will work on it in this PR.
If the manifest list (or oci index) contains the platform (an optional) field, we will populate pulp manifest with arch and os (whenever platform is defined, os and arch are required).

git-hyagi · 2024-10-01T14:34:36Z

After re-reading the specs, I realized that manifestlist or oci-index do not have a platform field. Actually, platform is a field from manifests. In this link, we can see it better: https://github.com/opencontainers/image-spec/blob/main/image-index.md (platform is a bullet inside manifests).

So, please, ignore my last comment and yes, "OCI Image Index manifests will always have a null arch, OS, and size".

ianballou · 2024-10-01T15:45:54Z

After re-reading the specs, I realized that manifestlist or oci-index do not have a platform field. Actually, platform is a field from manifests. In this link, we can see it better: https://github.com/opencontainers/image-spec/blob/main/image-index.md (platform is a bullet inside manifests).

So, please, ignore my last comment and yes, "OCI Image Index manifests will always have a null arch, OS, and size".

Nice catch! I didn't notice it was under manifests, so that's perfect.

git-hyagi · 2024-10-02T17:47:31Z

If we need to store the "architecture" and "os" fields on the Manifest model, we should try to fetch the values from a manifest list first. An OCI Index has the "architecture" and "os" fields required. There is no need to read the data from a config blob if a manifest is listed within an index (https://github.com/opencontainers/image-spec/blob/main/image-index.md#image-index-property-descriptions).

I did some investigation in each workflow, and here are my findings:

for the sync and pull-through tasks, the first declarative content that we pass to the next pipeline stage is/are blobs/configblobs

pulp_container/pulp_container/app/tasks/sync_stages.py

Lines 330 to 345 in 1a8fe63

    
               async def handle_blobs(self, manifest_dc, content_data): 
        
                   """ 
        
                   Handle blobs. 
        
                   """ 
        
                   manifest_dc.extra_data["blob_dcs"] = [] 
        
                   for layer in content_data.get("layers") or content_data.get("fsLayers"): 
        
                       if not self._include_layer(layer): 
        
                           continue 
        
                       blob_dc = self.create_blob(layer) 
        
                       manifest_dc.extra_data["blob_dcs"].append(blob_dc) 
        
                       await self.put(blob_dc) 
        
                   layer = content_data.get("config", None) 
        
                   if layer: 
        
                       blob_dc = self.create_blob(layer, deferred_download=False) 
        
                       manifest_dc.extra_data["config_blob_dc"] = blob_dc 
        
                       await self.put(blob_dc)

after that, in the resolve_flush method, we then send the manifests and manifest lists in this order:

pulp_container/pulp_container/app/tasks/sync_stages.py

Lines 274 to 306 in 1a8fe63

    
           async def resolve_flush(self): 
        
               """Resolve pending contents dependencies and put in the pipeline.""" 
        
               # Order matters! Things depended on must be resolved first. 
        
               for manifest_dc in self.manifest_dcs: 
        
                   config_blob_dc = manifest_dc.extra_data.get("config_blob_dc") 
        
                   if config_blob_dc: 
        
                       manifest_dc.content.config_blob = await config_blob_dc.resolution() 
        
                       await sync_to_async(manifest_dc.content.init_labels)() 
        
                       manifest_dc.content.init_image_nature() 
        
                   for blob_dc in manifest_dc.extra_data["blob_dcs"]: 
        
                       # Just await here. They will be associated in the post_save hook. 
        
                       await blob_dc.resolution() 
        
                   await self.put(manifest_dc) 
        
               self.manifest_dcs.clear() 
        
               for manifest_list_dc in self.manifest_list_dcs: 
        
                   for listed_manifest in manifest_list_dc.extra_data["listed_manifests"]: 
        
                       # Just await here. They will be associated in the post_save hook. 
        
                       await listed_manifest["manifest_dc"].resolution() 
        
                   await self.put(manifest_list_dc) 
        
               self.manifest_list_dcs.clear() 
        
               for tag_dc in self.tag_dcs: 
        
                   tagged_manifest_dc = tag_dc.extra_data["tagged_manifest_dc"] 
        
                   tag_dc.content.tagged_manifest = await tagged_manifest_dc.resolution() 
        
                   await self.put(tag_dc) 
        
               self.tag_dcs.clear() 
        
               for signature_dc in self.signature_dcs: 
        
                   signed_manifest_dc = signature_dc.extra_data["signed_manifest_dc"] 
        
                   signature_dc.content.signed_manifest = await signed_manifest_dc.resolution() 
        
                   await self.put(signature_dc) 
        
               self.signature_dcs.clear()

for the push workflow, we first push the blobs and the corresponding manifest for that image, and, just after pushing all images, the manifest-list/oci-index is pushed

I couldn't find how to fetch these values from the manifest list in advance. I'm not sure if I overlooked something or misunderstood the code workflow.

lubosmj · 2024-10-03T08:27:10Z

Mirroring:
I believe you can extract the information about os/architecture inside the create_listed_manifest method:

pulp_container/pulp_container/app/tasks/sync_stages.py

Line 480 in 1a8fe63

man_dc = DeclarativeContent(content=manifest)

.
Pushing:
When it comes to pushing, we may want to get this information from the already uploaded config blob:

pulp_container/pulp_container/app/registry_api.py

Line 1295 in 1a8fe63

config_blob = found_config_blobs.first()

.

If we are fine (we are mostly not) with updating existing manifests, we could potentially update the objects here:

pulp_container/pulp_container/app/registry_api.py

Line 1221 in 1a8fe63

manifest_to_list = models.ManifestListManifest(

.

git-hyagi · 2024-10-03T18:51:07Z

Thank you for the help and the suggestions/optimizations!

ianballou · 2024-10-23T13:18:12Z

cc @sjha4 @qcjames53 we should keep an eye on this and integrate with it in Katello sooner rather than later to avoid excess reindexing of container manifests.

closes: pulp#1767

ipanova · 2024-10-24T10:56:12Z

pulp_container/app/models.py

@@ -103,6 +108,9 @@ class Manifest(Content):

    annotations = models.JSONField(default=dict)
    labels = models.JSONField(default=dict)
+    architecture = models.TextField(null=True)
+    os = models.TextField(null=True)
+    compressed_layers_size = models.IntegerField(null=True)


how about compressed_image_size

github-actions bot added multi-commit no-changelog labels Sep 25, 2024

lubosmj requested changes Sep 26, 2024

View reviewed changes

git-hyagi force-pushed the add-manifest-size-arch-fields branch from c6cc26f to 27cfe78 Compare October 2, 2024 17:32

github-actions bot removed the multi-commit label Oct 2, 2024

git-hyagi force-pushed the add-manifest-size-arch-fields branch from 27cfe78 to c7816f8 Compare October 3, 2024 18:49

git-hyagi marked this pull request as draft October 3, 2024 19:08

git-hyagi marked this pull request as ready for review October 3, 2024 19:37

lubosmj marked this pull request as draft October 11, 2024 13:14

git-hyagi force-pushed the add-manifest-size-arch-fields branch from c7816f8 to 8b53a1e Compare October 22, 2024 18:54

github-actions bot removed the no-changelog label Oct 22, 2024

git-hyagi force-pushed the add-manifest-size-arch-fields branch 2 times, most recently from ef816f0 to c152cf4 Compare October 22, 2024 20:05

Add os/arch/layers_size fields to manifest model

8f390fa

closes: pulp#1767

git-hyagi force-pushed the add-manifest-size-arch-fields branch from c152cf4 to 8f390fa Compare October 23, 2024 14:45

git-hyagi marked this pull request as ready for review October 23, 2024 15:32

ipanova reviewed Oct 24, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add manifest arch, os, and compressed layers size fields #1782

Add manifest arch, os, and compressed layers size fields #1782

git-hyagi commented Sep 25, 2024

lubosmj left a comment

lubosmj Sep 26, 2024

lubosmj Sep 26, 2024

ianballou commented Sep 30, 2024

ianballou commented Sep 30, 2024 •

edited

Loading

git-hyagi commented Oct 1, 2024

git-hyagi commented Oct 1, 2024 •

edited

Loading

ianballou commented Oct 1, 2024

git-hyagi commented Oct 2, 2024

lubosmj commented Oct 3, 2024

git-hyagi commented Oct 3, 2024

ianballou commented Oct 23, 2024

ipanova Oct 24, 2024

Add manifest arch, os, and compressed layers size fields #1782

Are you sure you want to change the base?

Add manifest arch, os, and compressed layers size fields #1782

Conversation

git-hyagi commented Sep 25, 2024

lubosmj left a comment

Choose a reason for hiding this comment

lubosmj Sep 26, 2024

Choose a reason for hiding this comment

lubosmj Sep 26, 2024

Choose a reason for hiding this comment

ianballou commented Sep 30, 2024

ianballou commented Sep 30, 2024 • edited Loading

git-hyagi commented Oct 1, 2024

git-hyagi commented Oct 1, 2024 • edited Loading

ianballou commented Oct 1, 2024

git-hyagi commented Oct 2, 2024

lubosmj commented Oct 3, 2024

git-hyagi commented Oct 3, 2024

ianballou commented Oct 23, 2024

ipanova Oct 24, 2024

Choose a reason for hiding this comment

ianballou commented Sep 30, 2024 •

edited

Loading

git-hyagi commented Oct 1, 2024 •

edited

Loading