-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for inlined compressed data (larger refactoring) #203
Add support for inlined compressed data (larger refactoring) #203
Conversation
By defining a Blob message that overlaps with Digest, we can repurpose Digest fields to allow data inlining with optional compression in more places. The Digest message is then only used in cases where the receiver wants a reference rather than data. In this example I have changed most of the Digest fields to Blobs, which may lead to over-inlining of data. Each case needs to be considered separately before this change is ready for consideration. Refers to bazelbuild#201.
// to run, which MUST be present in the | ||
// [ContentAddressableStorage][build.bazel.remote.execution.v2.ContentAddressableStorage]. | ||
Digest command_digest = 1; | ||
Blob command = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I strongly object to this change.
With the platform properties being promoted from Command to Action, scheduler processes may nowadays be implemented in such a way that they don't need to load the Command message from the CAS. This effectively reverts that change.
The Command message can get big really quickly, especially in the case of projects where people use (strip_)include_prefix too much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To add to that: is it even likely that inlining Command into Action is advantageous? You only upload Action and Command into the CAS in case of remote execution; not remote caching. Most remote execution services hopefully have their workers located close to storage. Do we really care about reducing latency/roundtrips there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds reasonable- this is the "over-inlining" problem I mentioned in the description (this is only intended as an early draft for discussion).
// [Directory][build.bazel.remote.execution.v2.Directory] object | ||
// represented. See [Digest][build.bazel.remote.execution.v2.Digest] | ||
// for information about how to take the digest of a proto message. | ||
Digest digest = 2; | ||
Blob blob = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it permitted to also use this construct inside of Tree objects? If not, should it be stated? If so, any guidelines on canonicalisation?
string hash = 1; | ||
int64 size_bytes = 2; // This refers to the uncompressed size | ||
|
||
reserved 3, 4; // Leave some room in case we want to extend Digest later(?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we add more fields to Digest, then clients/servers will already update to a newer version of the .proto. In that case they will already switch from Digest to Blob.
In other words, I don't see how reserving numbers here helps us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might be right. If we update Digest later, we could reserve fields in Digest to "jump over" the extra fields in Blob.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we update Digest later, it's completely safe to use any numbers that overlap with Blob.
I'm going to close this draft PR since there is more interest in #202. |
By defining a Blob message that overlaps with Digest, we can repurpose Digest fields to allow data inlining with optional compression in more places. The Digest message is then only used in cases where the receiver wants a reference rather than data.
In this example I have changed most of the Digest fields to Blobs, which may lead to over-inlining of data. Each case needs to be considered separately before this change is ready for consideration.
Refers to #201.