Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added suport for monolithic upload. #1274

Merged
merged 1 commit into from
May 27, 2023
Merged

Added suport for monolithic upload. #1274

merged 1 commit into from
May 27, 2023

Conversation

ipanova
Copy link
Member

@ipanova ipanova commented May 3, 2023

closes #1219

"""
_, repository = self.get_dr_push(request, path, create=True)

if self.tries_to_mount_blob(request):
response = self.mount_blob(request, path, repository)
return response
elif digest := request.query_params.get("digest"):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a theoretical part, based only on docs specs https://docs.docker.com/registry/spec/api/
I am not aware of any client that would be using this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am quite puzzled by the specs. We can complete a monolithic upload in either a PUT or POST request. This is what you meant by the theoretical part, correct?

https://stackoverflow.com/a/57068286

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you not the only one ;)

monolithic upload can be done 2 ways:

  1. Make a POST request with digest as request param and complete in one request the blob upload ( this is the theoretical part because i have no idea what clients use this, so basically this is implemented only based on docs without a way to prove this works fine). Also note that I am returning a 201 BlobResponse and not UploadResponse
  2. Make a POST request without digest request param, then make a PUT request and send the chunk in the body.

@@ -684,11 +747,15 @@ def put(self, request, path, pk=None):
"""
Create a blob from uploaded chunks.

Note: We do not support monolithic upload.
This request makes the upload complete. It can whether carry a zero-length
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here, this can be tested with podman( which is sending empty body in the PUT request) and helm client that sends last ( in its case the only chunk) in the PUT request.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, the last chunk can also be the "main" chunk in the monolithic upload, right?

To carry out a “monolithic” upload, one can simply put the entire content blob to the provided URL: PUT /v2//blobs/uploads/?digest=<digest>

https://docs.docker.com/registry/spec/api/#monolithic-upload

Copy link
Member Author

@ipanova ipanova May 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the monolithic upload it is always the last and main chunk, that's why it is called monolithic.
It never issues a PATCH request where chunks are being sent ( as in chunked upload).

To make it even more confusing, podman claims to use chunked upload, but in reality it sens only one chunk in the PATCH request and no body in the PUT. It also does not send the content-range header in the PATCH request that would indicate a legal chunked upload ;) So that if branch in the PATCH code with the content-range header is also theoretical ;)

Copy link
Member

@lubosmj lubosmj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a few comments. It might be beneficial to merge this PR after merging pending blobs first, as discussed before.


return response
def create_single_chunk_artifact(self, chunk):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I miss basic error handling in this method (e.g., digest invalid, omitted content-type, etc.).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Digest validation is done in init_and_validate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I interchanged the method I wanted to reference. I meant to place this comment in single_request_upload. I wanted rather to validate the format of the digest.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are few places that would benefit from digest validation format, at least in the PUT request.
I would not bother with that, it would just give a 404. Or at least not in the scope of this PR.

@@ -684,11 +747,15 @@ def put(self, request, path, pk=None):
"""
Create a blob from uploaded chunks.

Note: We do not support monolithic upload.
This request makes the upload complete. It can whether carry a zero-length
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, the last chunk can also be the "main" chunk in the monolithic upload, right?

To carry out a “monolithic” upload, one can simply put the entire content blob to the provided URL: PUT /v2//blobs/uploads/?digest=<digest>

https://docs.docker.com/registry/spec/api/#monolithic-upload

"""
_, repository = self.get_dr_push(request, path, create=True)

if self.tries_to_mount_blob(request):
response = self.mount_blob(request, path, repository)
return response
elif digest := request.query_params.get("digest"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am quite puzzled by the specs. We can complete a monolithic upload in either a PUT or POST request. This is what you meant by the theoretical part, correct?

https://stackoverflow.com/a/57068286

@lubosmj
Copy link
Member

lubosmj commented May 3, 2023

I think we need to handle overloading too. This is however not specified in the documentation. May a helm client decide to flood and freeze our API with a huge monolithic upload once we will support immediate tasks?

try:
blob_artifact = ContentArtifact(
artifact=artifact, content=blob, relative_path=digest
)
blob_artifact.save()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, this should be part of the first try. The blob cannot exist without it's content artifact. So finding the blob in 608 means the ContentArtifact must already be in place.

Copy link
Member Author

@ipanova ipanova May 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I will change this.
I mostly moved around the code that was repeated in few places without looking too thoroughly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I will not. Here is the reason to it! c06c025
Should I rather leave a comment in the code?

Comment on lines 614 to 615
except IntegrityError:
ca = ContentArtifact.objects.get(content=blob, relative_path=digest)
if not ca.artifact:
ca.artifact = artifact
ca.save(update_fields=["artifact"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which means we would need none of this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am i missing not yet downloaded remote artifacts here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are no remote artifacts in the uploaded content

@ipanova
Copy link
Member Author

ipanova commented May 4, 2023

I think we need to handle overloading too. This is however not specified in the documentation. May a helm client decide to flood and freeze our API with a huge monolithic upload once we will support immediate tasks?

There is an issue that can address your concern with the limitation of the max body #532

@ipanova ipanova force-pushed the i1219 branch 2 times, most recently from 35aa23a to 4d5b05d Compare May 26, 2023 13:10
@lubosmj lubosmj merged commit cd37507 into pulp:main May 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for monolitic upload
3 participants