-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compression support for inlined data (v2-compatible) #201
Comments
Heya Mostyn,
We'd be interested in compressed inlined blobs too, yep! We have
encountered some builds where it's relevant ourselves, with the batch APIs
moving a non-trivial percent of the total bytes. (Builds particularly heavy
in small files).
For implementation details, what's the benefit of introducing a Blob
message to wrap these fields vs just adding compressor where necessary?
Specifically, I could see three main variants:
1. Just stick a `compressor` field beside `bytes` where relevant.
(BatchUpdateBlobsRequest, BatchReadBlobsResponse, OutputFile, ActionResult
x2 for std*_raw).
2. Use `Blob` to wrap `bytes` and `compressor`, but *not* the Digest
fields.
3. Encapsulate Blob as a replacement for Digest, as you've described.
Looking at the API as it currently stands, I think I'd be in favour 1 for
V2 - that way we don't introduce any deprecated fields or branch points,
and "clean" support simply looks like supporting the compressor value for
each bytes field, even if the opposite party will never set it to anything
non-passthrough. The use of Capabilities + request fields should I think
make it easy to do backwards compatibility safely (so no `bytes` fields
risk being misinterpreted).
2 or 3 seems nicest for V3, so the API can clearly express that anywhere
inlined bytes are present, they might be compressed. I guess 3 has the
benefit of expressing an "optionally inlined blob" in a single field,
rather than needing 2 for Digest + optional Blob? I could get behind that,
though I think then I'd want to see Digest nested *inside* Blob instead of
`hash` + `size_bytes`, so there are no places in the API a raw `hash` field
may show up except the Digest message itself.
…On Mon, Jul 12, 2021 at 3:50 PM Mostyn Bramley-Moore < ***@***.***> wrote:
We added support for compressed blobs via the bytestream API a while back,
and this has been working well so far. Given that we don't seem to be
gearing up for a non-backwards compatible REAPIv3 release, I think we
should start investigating adding backwards-compatible support for
compression of inlined data, aiming to minimize roundtrips while still
using compressed data.
One idea that I have been thinking about is to create a new Blob message
which has the same fields as Digest, but with two additional optional
fields: bytes data to hold the inlined data and Compressor.Value
compressor to describe the encoding. Then, we would replace each usage of
Digest in a message that currently also has a separate field for inlined
(uncompressed) data with a Blob field, and deprecate the previous data
field.
For example:
message Blob {
// The first two field types + numbers must match those in Digest
string hash = 1;
int64 size_bytes = 2; // This refers to the uncompressed size
reserved 3, 4; // Leave some room in case we want to extend Digest later(?)
bytes data = 5; // Possibly compressed data, if set
Compressor.Value compressor = 6; // Encoding used, if the data field is set
}
For "request" messages that can be responded to with inlined data, we
would add a repeated field that specifies the compression types that the
client accepts. The server would pick one of those encodings (or
identity/uncompressed) for each Blob that it chooses to inline data for. We
would also add a new field in the Capabilities API for servers to advertise
support for this.
I believe this would be backwards-compatible- old clients would not
request compressed inlined data, and would receive Blobs that are
decodable as Digests in response, and servers would have a clear signal
for when this feature can be used in responses.
Is anyone else interested in this feature?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#201>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABREW7CK3SSJRBE4H3OLIDTXNBPVANCNFSM5AHQBDYQ>
.
|
I have mostly been considering this from the "cleaner v3 design, retrofitted to v2" angle (3), rather than the "minimal required changes to v2" point of view (1). (2) doesn't sound like a great compromise to me, but I'd need to read up more on protobufv3 to understand the implications well. There are benefits to each of (1) and (3) course, and I'd be happy to sketch out solutions to both, or either one if there's a clear preference in the group. |
For the path to adding it to V2, I think (1) stands to be cleaner and not
lean on reinterpreting messages as other messages, so I'd be interested in
getting your take on it.
I think that can be independent of what exact pattern we'd pick for V3, so
I'd suggest treating them as independent decisions. I don't feel terribly
strongly about what pattern V3 *should *use, beyond being skeptical of
inlining Digest fields into anything :).
No need to evaluate (2) if nothing points to it being best for either.
Thanks Mostyn!
…On Mon, Jul 12, 2021 at 6:01 PM Mostyn Bramley-Moore < ***@***.***> wrote:
I have mostly been considering this from the "cleaner v3 design,
retrofitted to v2" angle (3), rather than the "minimal required changes to
v2" point of view (1). (2) doesn't sound like a great compromise to me, but
I'd need to read up more on protobufv3 to understand the implications well.
There are benefits to each of (1) and (2) course, and I'd be happy to
sketch out solutions to both, or either one if there's a clear preference
in the group.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#201 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABREW6JI3B4PPCOYDBF2PTTXNQZZANCNFSM5AHQBDYQ>
.
|
We'd be interested in this too; we definitely observe a lot of builds with plenty of small but compressible artifacts (Go tends to be pretty heavy on this). Agreed that both (1) and (3) have benefits; worth mentioning (2) for completeness but I agree that it's unlikely to be appealing as a solution. I think (1) is my favourite; (3) is a little weird from the perspective of the generated code given you have a thing that is like a Digest but is not actually one (so it'd be fine in languages like Python but not in say Go or Java). I like the idea in general of encapsulating all blobs/digests with something like this Blob message as a V3 change; that seems more elegant than lots of individual sites re-inventing the same thing. But that doesn't seem practical for V2 and the embedded Request/Response messages seem like reasonable things to extend with the compressor field. |
This is a small API change which allows for inlined data to be compressed. Refers to bazelbuild#201.
By defining a Blob message that overlaps with Digest, we can repurpose Digest fields to allow data inlining with optional compression in more places. The Digest message is then only used in cases where the receiver wants a reference rather than data. In this example I have changed most of the Digest fields to Blobs, which may lead to over-inlining of data. Each case needs to be considered separately before this change is ready for consideration. Refers to bazelbuild#201.
Created a couple of draft PRs to gather opinions: |
This is a small API change which allows for inlined data to be compressed. Refers to bazelbuild#201.
This is a small API change which allows for inlined data to be compressed. Refers to bazelbuild#201.
This is a small API change which allows for inlined data to be compressed form in BatchReadBlobs and BatchUpdateBlobs calls. Refers to bazelbuild#201.
This is a small API change which allows for inlined data to be compressed form in BatchReadBlobs and BatchUpdateBlobs calls. Refers to bazelbuild#201.
This is a small API change which allows for inlined data to be compressed form in BatchReadBlobs and BatchUpdateBlobs calls. Refers to bazelbuild#201.
This is a small API change which allows for inlined data to be compressed form in BatchReadBlobs and BatchUpdateBlobs calls. Refers to bazelbuild#201.
This is a small API change which allows for inlined data to be compressed form in BatchReadBlobs and BatchUpdateBlobs calls. Refers to bazelbuild#201.
This is a small API change which allows for inlined data to be compressed form in BatchReadBlobs and BatchUpdateBlobs calls. Refers to bazelbuild#201.
* Add support for inlined compressed data in batch CAS operations This is a small API change which allows for inlined data to be compressed form in BatchReadBlobs and BatchUpdateBlobs calls. Refers to #201. * Remove some stray parentheses
We added support for compressed blobs via the bytestream API a while back (#147), and this has been working well so far. Given that we don't seem to be gearing up for a non-backwards compatible REAPIv3 release, I think we should start investigating adding backwards-compatible support for compression of inlined data, aiming to minimize roundtrips while still using compressed data.
One idea that I have been thinking about is to create a new
Blob
message which has the same fields asDigest
, but with two additional optional fields:bytes data
to hold the inlined data andCompressor.Value compressor
to describe the encoding. Then, we would replace each usage ofDigest
in a message that currently also has a separate field for inlined (uncompressed) data with aBlob
field, and deprecate the previous data field.For example:
For "request" messages that can be responded to with inlined data, we would add a repeated field that specifies the compression types that the client accepts. The server would pick one of those encodings (or identity/uncompressed) for each Blob that it chooses to inline data for. We would also add a new field in the Capabilities API for servers to advertise support for this.
I believe this would be backwards-compatible- old clients would not request compressed inlined data, and would receive
Blobs
that are decodable asDigests
in response, and servers would have a clear signal for when this feature can be used in responses.Is anyone else interested in this feature?
The text was updated successfully, but these errors were encountered: