(More) General spec comments #36

jsgf · 2023-01-25T19:44:49Z

Scope

manner of embedding identifiers for ADGs, and other related data structures in artifacts of various types
This is not covered here, other than a couple of mentions of its necessity. I think the specific details of this are out of scope for the main part of the spec. It should be covered in the annexes or (eventually) related specs.

guidance
guidance

I generally find it cleaner to have a spec which contains all the mandatory aspects, and a second related document describing how to use the spec in practice.

Definitions

In "Derived artifacts"

An artifact output from a build tool based on artifiacts input into the build tool. Such an artifact is said to be 'derived' from the input artifacts to the build tool.

The wording here strikes me as odd; it uses the noun "build tool" as a stand-in for the verb "build action" or "build process" which is performed by the build tool. We want to be very clear that the "inputs" here are consumed by the build tool as part of the action (verb), and are not inputs used to create the build tool (noun) itself. (Esp since in principle the build tool could itself be a derived artifact.)

In "Leaf artifacts", do we generally want to define these as any artifact which are created out of the scope of the omnibom-instrumented build process? Hand-written source files fall into this class. But what about things like prebuilt binaries, system libraries, etc? Source files/headers provided by the development environment (which could be machine generated)?

In "Artifact Identifiers" (definitions), should it clarify each identifier has a discrete type? So for "canonical" to work, the parties have to be generating identifiers of the same type. Are artifact identifiers required to be fixed size? Could you, as an extreme, use the literal artifact content as an identifier? It would meet the requirements.

In "Artifact Dependency Graph (ADG)" is "recursive" needed? What would a non-recursive DAG be?

It includes the direct input artifacts, and the recursive set of input artifacts to each input artifact, all the way down the graph.

I'm not sure what this sentence is saying. I wonder if it might be better to describe this inductively - each node in the graph represents an artifact and the reference to the inputs used to create it, and the whole graph is composed of such nodes to form the whole set of dependencies to produce some final artifact. A key property is that given a graph, you can extend it by adding new nodes, or conversely, you can remove nodes which are not the inputs to any other node (ie subgraphs represent the build dependencies of intermediate derived artifacts).

Artifact Dependency Graph (ADG) singularity

I wonder about the use of the word "singularity" here - I wonder if something like "convergence" might express this idea better.

An artifact should have precisely one Artifact Dependency Graph (ADG). All equivalent artifacts should have the same Artifact Dependency Graph (ADG).

Should we point out that this requires the artifacts to embed the omnibor document, otherwise there's no way to guarantee this? And as a consequence, all file formats involved in a build must have a way of embedded some kind of non-functional content within them.

"Build tools" - followup from comment above, the tool does nothing on its own, but implements a build action with a specific set of inputs to create a specific output. Is it required to be deterministic? Ie, the same tool with the same inputs will always generate the same artifact. Seems like it would be idea, but tools have a distressing habit of including things like timestamps and commit ids into their outputs.

Specifications

"Artifact ID" - is this the same as "Artifact Identifier"? If so, we should avoid the use of abbreviations. Also should we have the same entity both "defined" and "specified"? I would assume that "definitions" are just to give defined meanings to terms which are later used in the specification.

This doesn't ever specify how identifiers are represented in general. Do they include their type? Are they binary? Ascii? Hex encoded? uuencode?

In "Artifact Identifier Types" it talks about git oids, and has a list of oid prefixes which represent types. Does this mean that all "artifact identifier types" are gid oids? Could there be non-gid-oid identifier types? If there are mandatory identifier types, does mean they're the only ones allowed or it can be extended? Ie, is that an open or closed list?

This approach is anticipated to extend gracefully as git adopts new hash types in the future.

This seems very much like an open question.

OmniBOR Identifier

Perhaps "OmniBOR Document Identifier" would be clearer, since all identifiers within the OmniBOR spec could be interpreted as "OmniBOR Identifiers".

A OmniBOR Document describes the immediate children of an artifact in the Artifact Dependency Graph (ADG).

What does "children" mean in the context of a graph. Since the graph only represents inputs, it implies that children are the inputs, but typically children are the outputs of a process. I think something like "immediate inputs of the build process" would be clearer.

A OmniBOR Document utilizes precisely one Artifact Identifier Type.

Perhaps "All identifiers within a single omnibor document are of the same type. The document itself may only be referenced with identifiers of that same type". Ie, it's not valid to reference a sha1 document with a sha256 id (which I think is stronger than the constraint that "omnibor document identifier" covers).

OmniBOR Document Child Records

As above, should use "input" not "child".

blob⎵${artifact id of child}⎵bom⎵${OmniBOR identifier of child's OmniBOR Document}\n

Is this technically redundant, since the child artifact already has its omnibor document embedded within it? Is the bom field just there to save tooling from working out how to extract the document. Should this be bor rather than bom?

OmniBOR Document Character Encoding

I think this should come first, before we start talking about any specific layouts. Also mention that all whitespace is rigid (ie, literal space not tab, no runs of whitespace).

OmniBOR Identifier Embedding

Each build tool should embed into the output artifact a new line terminated, lexically ordered, list of OmniBOR identifiers for each mandatory Artifact Identifier Type in a manner:

Does this mean that just the document identifier is embedded, or the whole document? What is the "new line terminated, lexically ordered list" referring to? If its multiple document ids, then is that constraining whatever embedding mechanism to allow an arbitrarily sized list of document identifiers to be embedded contiguously? Doesn't this, for example, make the Annex B ELF note embedding in conflict with this?

What is the identifier representation being ordered? Does it include the identifier type, or is it just in the order of the raw hash? Are identifiers hex-encoded in ascii or raw binary?

Is embedding the entire document also OK?

Is it required that the embedded info also be extractable, or is it enough to embed just enough to meet the uniqueness requirements of "Artifact Dependency Graph (ADG) singularity". For example, could it be any other sufficiently strong hash of the bomdocs?

OmniBOR Document Construction by a Build Tool
Examine the input for an embedded OmniBOR Document Identifier - ${omnibor document identifier}

Implies that the build tools need to know how to extract identifiers from any input document.

The text was updated successfully, but these errors were encountered:

alilleybrinker · 2023-09-20T22:44:36Z

Hi Jeremy, if you're amenable, I'd like to identify which items in this issue remain open questions, and split them off into more focused discussions or issues accordingly. Based on my reading, here's the enumeration of everything raised above:

I think that's a thorough accounting of the issues presented above. With your agreement, I can go and open fresh distinct issues for each of the unchecked items in my list. If you have any others which ought to be made, we can open those too.

alilleybrinker added the s-needs-info Status: Further information is requested label Sep 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(More) General spec comments #36

(More) General spec comments #36

jsgf commented Jan 25, 2023 •

edited

Loading

alilleybrinker commented Sep 20, 2023

(More) General spec comments #36

(More) General spec comments #36

Comments

jsgf commented Jan 25, 2023 • edited Loading

Scope

Definitions

Specifications

alilleybrinker commented Sep 20, 2023

jsgf commented Jan 25, 2023 •

edited

Loading