-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(More) General spec comments #36
Comments
Hi Jeremy, if you're amenable, I'd like to identify which items in this issue remain open questions, and split them off into more focused discussions or issues accordingly. Based on my reading, here's the enumeration of everything raised above:
I think that's a thorough accounting of the issues presented above. With your agreement, I can go and open fresh distinct issues for each of the unchecked items in my list. If you have any others which ought to be made, we can open those too. |
Scope
I generally find it cleaner to have a spec which contains all the mandatory aspects, and a second related document describing how to use the spec in practice.
Definitions
In "Derived artifacts"
The wording here strikes me as odd; it uses the noun "build tool" as a stand-in for the verb "build action" or "build process" which is performed by the build tool. We want to be very clear that the "inputs" here are consumed by the build tool as part of the action (verb), and are not inputs used to create the build tool (noun) itself. (Esp since in principle the build tool could itself be a derived artifact.)
In "Leaf artifacts", do we generally want to define these as any artifact which are created out of the scope of the omnibom-instrumented build process? Hand-written source files fall into this class. But what about things like prebuilt binaries, system libraries, etc? Source files/headers provided by the development environment (which could be machine generated)?
In "Artifact Identifiers" (definitions), should it clarify each identifier has a discrete type? So for "canonical" to work, the parties have to be generating identifiers of the same type. Are artifact identifiers required to be fixed size? Could you, as an extreme, use the literal artifact content as an identifier? It would meet the requirements.
In "Artifact Dependency Graph (ADG)" is "recursive" needed? What would a non-recursive DAG be?
I'm not sure what this sentence is saying. I wonder if it might be better to describe this inductively - each node in the graph represents an artifact and the reference to the inputs used to create it, and the whole graph is composed of such nodes to form the whole set of dependencies to produce some final artifact. A key property is that given a graph, you can extend it by adding new nodes, or conversely, you can remove nodes which are not the inputs to any other node (ie subgraphs represent the build dependencies of intermediate derived artifacts).
I wonder about the use of the word "singularity" here - I wonder if something like "convergence" might express this idea better.
Should we point out that this requires the artifacts to embed the omnibor document, otherwise there's no way to guarantee this? And as a consequence, all file formats involved in a build must have a way of embedded some kind of non-functional content within them.
"Build tools" - followup from comment above, the tool does nothing on its own, but implements a build action with a specific set of inputs to create a specific output. Is it required to be deterministic? Ie, the same tool with the same inputs will always generate the same artifact. Seems like it would be idea, but tools have a distressing habit of including things like timestamps and commit ids into their outputs.
Specifications
"Artifact ID" - is this the same as "Artifact Identifier"? If so, we should avoid the use of abbreviations. Also should we have the same entity both "defined" and "specified"? I would assume that "definitions" are just to give defined meanings to terms which are later used in the specification.
This doesn't ever specify how identifiers are represented in general. Do they include their type? Are they binary? Ascii? Hex encoded? uuencode?
In "Artifact Identifier Types" it talks about git oids, and has a list of oid prefixes which represent types. Does this mean that all "artifact identifier types" are gid oids? Could there be non-gid-oid identifier types? If there are mandatory identifier types, does mean they're the only ones allowed or it can be extended? Ie, is that an open or closed list?
This seems very much like an open question.
Perhaps "OmniBOR Document Identifier" would be clearer, since all identifiers within the OmniBOR spec could be interpreted as "OmniBOR Identifiers".
What does "children" mean in the context of a graph. Since the graph only represents inputs, it implies that children are the inputs, but typically children are the outputs of a process. I think something like "immediate inputs of the build process" would be clearer.
Perhaps "All identifiers within a single omnibor document are of the same type. The document itself may only be referenced with identifiers of that same type". Ie, it's not valid to reference a sha1 document with a sha256 id (which I think is stronger than the constraint that "omnibor document identifier" covers).
As above, should use "input" not "child".
Is this technically redundant, since the child artifact already has its omnibor document embedded within it? Is the
bom
field just there to save tooling from working out how to extract the document. Should this bebor
rather thanbom
?I think this should come first, before we start talking about any specific layouts. Also mention that all whitespace is rigid (ie, literal space not tab, no runs of whitespace).
Does this mean that just the document identifier is embedded, or the whole document? What is the "new line terminated, lexically ordered list" referring to? If its multiple document ids, then is that constraining whatever embedding mechanism to allow an arbitrarily sized list of document identifiers to be embedded contiguously? Doesn't this, for example, make the Annex B ELF note embedding in conflict with this?
What is the identifier representation being ordered? Does it include the identifier type, or is it just in the order of the raw hash? Are identifiers hex-encoded in ascii or raw binary?
Is embedding the entire document also OK?
Is it required that the embedded info also be extractable, or is it enough to embed just enough to meet the uniqueness requirements of "Artifact Dependency Graph (ADG) singularity". For example, could it be any other sufficiently strong hash of the bomdocs?
Implies that the build tools need to know how to extract identifiers from any input document.
The text was updated successfully, but these errors were encountered: