-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UCO should import the Collections Ontology to handle ordered lists #389
Comments
A follow-on patch will generate Make-managed files. References: * #389 Signed-off-by: Alex Nelson <[email protected]>
References: * #389 Signed-off-by: Alex Nelson <[email protected]>
Discussion in today's OC meeting only made it through Risk 4. We will discuss the remaining risks at the next OC meeting. Feedback is encouraged in advance of the meeting. |
After reviewing this proposal alongside #393 and the referenced CO2 paper, I agree that UCO should import the Collections Ontology to handle ordered lists. The need for ordered lists and the solution provided by CO2 are compelling. The risks do not block this proposal. Although CO2 does not support non-linear message threads, UCO still requires representation of ordered lists (Risk 1). SHACL coverage, tooling support, inferencing, and maintaining documentation automatically are beyond the scope of this proposal (Risks 2, 4, 5, and 9), and are possible future development. Regarding Risk 6, conflict with Facet strategy, any existing examples of MessageThread in UCO or CASE are notional and should be replaced with updated representations (including message.json example). To address Risk 7, could we establishing a usage convention to use List and ListItem for ordered lists, and not the co:element equivalent property? |
…r directories A follow-on patch will regenerate Make-managed files. References: * ucoProject/UCO#389 Signed-off-by: Alex Nelson <[email protected]>
References: * ucoProject/UCO#389 Signed-off-by: Alex Nelson <[email protected]>
…r directories No effects were observed on Make-managed files. References: * ucoProject/UCO#389 Signed-off-by: Alex Nelson <[email protected]>
A follow-on patch will regenerate Make-managed files. References: * ucoProject/UCO#389 Signed-off-by: Alex Nelson <[email protected]>
References: * ucoProject/UCO#387 * ucoProject/UCO#389 Signed-off-by: Alex Nelson <[email protected]>
A follow-on patch will regenerate Make-managed files. References: * ucoProject/UCO#389 Signed-off-by: Alex Nelson <[email protected]>
References: * ucoProject/UCO#389 Signed-off-by: Alex Nelson <[email protected]>
A follow-on patch will regenerate Make-managed files. References: * ucoProject/UCO#389 Signed-off-by: Alex Nelson <[email protected]>
References: * ucoProject/UCO#389 Signed-off-by: Alex Nelson <[email protected]>
Documentation impactResolving shape IRIs as URLsThere is a CASE/UCO-developed Python script that generates symbolic links to map IRIs to generated documentation pages. That script has not been adapted yet to link shape files that are not simultaneously OWL classes. Hence, shape-only IRIs will not resolve as expected, but they will on someone taking some time with updating that script. This turns out to be a second blocker on IRI resolution; the first is that there is an unresolved error in the configuration of the documentation hosting server for UCO's IRI resolution. A ticket is being resolved with the provider to address this. SHACL shape pages for shapes applied without targetClassThese appear to work. For instance, this is the generated Turtle snippet for uco-co:index-subjects-shape: @prefix co: <http://purl.org/co/> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix uco-co: <https://ontology.unifiedcyberontology.org/co/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
uco-co:index-subjects-shape a sh:PropertyShape ;
sh:datatype xsd:positiveInteger ;
sh:nodeKind sh:Literal ;
sh:path co:index ;
sh:targetSubjectsOf co:index . SHACL shapes using
|
References: * #389 Signed-off-by: Alex Nelson <[email protected]>
References: * #389 Signed-off-by: Alex Nelson <[email protected]>
References: * #389 Signed-off-by: Alex Nelson <[email protected]>
The PR has been updated. The "root" |
A follow-on patch will regenerate Make-managed files. References: * ucoProject/UCO#389 Signed-off-by: Alex Nelson <[email protected]>
References: * ucoProject/UCO#389 Signed-off-by: Alex Nelson <[email protected]>
This will assist with the review of the transitive closure of the Collections Ontology. References: * #389 Signed-off-by: Alex Nelson <[email protected]>
The UCO OC has had a standing, vaguely-specified question about risk assessments with respect to importing ontologies. Issue 406 has just been posted to at least provide an answer to part of the risk assessments: Would importing this ontology cause UCO to become non-conformant with OWL 2 DL any more than it knows itself to be? (The remainder of the larger question, about modeling risks, remains out of scope of this comment and the linked issue.) Issue 406 starts the answer to that question by defining SHACL shapes that review OWL 2 DL conformance. It is highly likely to be an incomplete review, but at least hits some of the significant issues UCO tries to prevent exposing its users to, including but not limited to:
There are two results of applying this review to the Collections Ontology:
|
A few comments on various portions of the CP: Risk4: I think we need to be very careful in any attempts to apply the change in SHACL property shape definition style as proposed under Risk4 in any broad way. Risk6: I do not see any conflict with Facet strategy. The description for this risk implies that there is inconsistency or error in how Facets are currently used. I would disagree with this assertion. Facets are classes characterizing some aspect of a UcoObject through the properties associated with them. They are a special form of structured concept classes (as described in the UCO design document (https://unifiedcyberontology.org/resources/uco_design_document.html) that are only ever used as the range of the core:hasFacet property on UcoObjects. This is used to convey characterization of particular aspects of UcoObjects in UCO currently and is intended to serve as a clean extension point for the specification of custom structured concept class characterizations of particular aspects of UcoObjects by third party users outside of the currently defined UCO spec. The confusion about apparent inconsistency asserted in the risk writeup can be easily explained in that the second example (observable:MessageThread) is in the observable namespace where the community has had a longstanding explicit consensus to support duck typing for observables and that this imparted the requirement that properties of ObservableObjects are always conveyed via relevant Facets. Risk7: I agree that we should avoid trying to change Compilation to align with CO due to some of the semantic complexities explicitly outlined in risk7 and more generally referenced in the end to my comment above on risk 6. Risk8: I do not have a confident answer to this either though I suspect we would want to compile it into our monolithic build given that the purpose of that monolithic build is to convey the complete set of UCO such that it can be processed, analyzed and/or used without worrying about whether all parts are present and in the correct form. |
From voting today, we will include the error ontology patch. |
While updating example JSON-LD, I found an error I was somewhat expecting wasn't triggering. For everyone's awareness - the Collections Ontology makes some requirements on certain more-stringent integer types (e.g. Effectively, on resolution of this PR, some data updates may need to be made to assign types according to CO requirements. Meanwhile, there may be another patch added on top of the already-merged solution for UCO Issue 389, to add a |
A follow-on patch will regenerate Make-managed files. * ucoProject/UCO#387 * ucoProject/UCO#389 Signed-off-by: Alex Nelson <[email protected]>
This follows the general pattern of recent UCO import-review shapes files, for the Collections Ontology (389) and OWL (406). References: * ucoProject/UCO#389 * ucoProject/UCO#406 Signed-off-by: Alex Nelson <[email protected]>
A follow-on patch will regenerate Make-managed files. References: * ucoProject/UCO#387 * ucoProject/UCO#389 * ucoProject/UCO#391 * ucoProject/UCO#396 Signed-off-by: Alex Nelson <[email protected]>
References: * ucoProject/UCO#387 * ucoProject/UCO#389 * ucoProject/UCO#391 * ucoProject/UCO#396 Signed-off-by: Alex Nelson <[email protected]>
Effects were observed on Make-managed files from the prior patch were removed by this patch. References: ucoProject/UCO#389 Signed-off-by: Alex Nelson <[email protected]>
A draft version of this patch series assisted in reviewing Issue 389. References: * #389 * #406 Signed-off-by: Alex Nelson <[email protected]>
RDFS and OWL are receiving aliases for in-common spelling in adopters' code. OWL also specifically got further support in some UCO issues. This patch also adds a `Namespace` for the import of the Collections Ontology, and the new UCO namespace `configuration`. References: * ucoProject/UCO#389 * ucoProject/UCO#406 * ucoProject/UCO#432 * ucoProject/UCO#437 Signed-off-by: Alex Nelson <[email protected]>
RDFS and OWL are receiving aliases for in-common spelling in adopters' code. OWL also specifically got further support in some UCO issues. This patch also adds a `Namespace` for the import of the Collections Ontology, and the new UCO namespace `configuration`. References: * ucoProject/UCO#389 * ucoProject/UCO#406 * ucoProject/UCO#432 * ucoProject/UCO#437 Signed-off-by: Alex Nelson <[email protected]>
Disclaimer
Participation by NIST in the creation of the documentation of mentioned software is not intended to imply a recommendation or endorsement by the National Institute of Standards and Technology, nor is it intended to imply that any specific software is necessarily the best available for the purpose.
Background
The Collections Ontology provides implementations of several Set- and Set-adjacent concepts, as an OWL 2 DL ontology.
UCO has a need to provide the ability to represent ordered lists, where the order is not necessarily determinable or recordable by some "keying" ordering property (e.g. a timestamp, or an incrementing ID number). This need exists for file fragmentation in file system analysis (especially for reporting from where a file was recovered and in what order pieces were put together), message threading, and other applications.
UCO should not invest effort in designing an independent ordered list representation. In RDF, especially OWL 2 DL-based RDF, ordered lists are non-trivial to represent due to requirements imposed on RDF lists. In particular, OWL 2 DL requires RDF lists be blank nodes, and that they never fork. These requirements pose challenges for some UCO applications, necessitating some class-defining work be done to implement linked lists. This proposal suggests importing an ontology that has already carried out siginificant review of list implementation through the lens of OWL 2 DL requirements.
The source code of the Collections Ontology is trackable here:
https://github.com/collections-ontology/collections-ontology
A research article documenting and evaluating the ontology is here:
https://doi.org/10.3233/SW-130121
Requirements
Requirement 1
UCO must be able to provide an ability to represent an ordered list.
Requirement 2
UCO users must be able to validate usage of UCO's adopted and/or implementd ordered list concepts.
Requirement 3
UCO must be able to demonstrate compatibility with classes and properties of other independently-developed ontologies.
Requirement 4
The version of the Collections Ontology against which UCO develops its SHACL shapes must be known to the UCO user.
Risk / Benefit analysis
Benefits
DataRangeFacet
saying how far a chunk within the reconstructd file is from the beginning of the file), but this might not work in cases where pieces of the geometry information are missing.core:Compilation
. While the first goals in adopting CO classes are deeper in the UCO class hierarchy, there is opportunity to align more top-level classes with the heavily reviewed classes of CO.Risks
With this being potentially UCO's first import of an external ontology, there are several nontrivial points to consider.
Risk 1 - Linearity of CO List
The first class of UCO interest in CO,
co:List
, is linear only. (This is confirmable with some of the list member linking properties beingowl:FunctionalProperty
s, implying that after OWL inferencing, anyco:ListItem
would only have one nextco:ListItem
afterowl:sameAs
is applied.) Forking a list is not supported, which falls short of the needs of one of the intended first adopters of ordering,observable:MessageThread
.Fortunately, some of the superclasses and superproperties of
co:List
and its properties provide sufficient basis to build a forking variant similar toco:List
. That variant is provided in a separate proposal scoped toobservable:MessageThread
adopting CO.Risk 2 - Intentionally incomplete coverage of SHACL
When importing CO, there is a question of how much of this ontology should be as usable and testable to UCO users as UCO ontology concepts. That is, how much validation capability should UCO provide (or at least, incubate)?
This proposal's accompanying PR implements the minimal SHACL shapes needed to get a set of unit tests to pass. Those tests demonstrate expected correct and incorrect usage of concepts that will be needed to support (1) UCO's needs of
MessageThread
(coming in a separate PR) and (2) estimated needs to support some concept that uses the linearco:List
(a yet-unnamed file fragmentation representation, and/or disk partition systems).The PR for this adoption of CO goes no further with defining SHACL shapes. Interested community members should feel free to expand the coverage if they wish.
Risk 2.1 - Other integer types
CO employs specific integer types on some properties,
xsd:nonNegativeInteger
andxsd:positiveInteger
. A community member provided early feedback on this ontology, suggesting these be relaxed in SHACL review because JSON-LD output size grew considerably.Contrary to the incomplete SHACL coverage of CO, the proposed SHACL enforcement respects the datatype designations, and works to ensure that data validated with SHACL is consistent with non-UCO usage of CO concepts.
The extra JSON-LD file weight can be reverted by usage of JSON-LD context dictionaries.
Risk 3 - Transitive closure - "error" ontology
CO imports a utility ontology, the Error Ontology. To import CO is, through transitive closure, to import the Error Ontology, and its sole property, an annotation property named "error" with a maximum cardinality of 0. Its usage model is, if it appears, declare the graph OWL-inconsistent.
Whether to implement a SHACL restriction for this property is left out of scope of this proposal. The risk of importing the Error Ontology is believed to be 0.
Risk 4 - Revisions to SHACL coding style may exceed documentation capabilities
(This risk pertains to the Solutions Approval phase of the Change Proposal process, but since a solution is being provided, we should feel free to discuss it earlier if beneficial.)
The PR accompanying this proposal changes how validation of individual properties occurs. To date, SHACL properties in UCO have been inlined in class definitions as anonymous
sh:PropertyShape
individuals. E.g. this excerpt of UCO's core Turtle defines a property shape, which establishes requirements forcore:name
, but only in the context of the classcore:UcoObject
:core:UcoObject a owl:Class , sh:NodeShape ; sh:property [ sh:datatype xsd:string ; sh:maxCount "1"^^xsd:integer ; sh:nodeKind sh:Literal ; sh:path core:name ; ] ; sh:targetClass core:UcoObject ; .
This means
core:name
can be used with no restrictions on any class that is not acore:UcoObject
subclass. This is a programming flaw, and an interested community member should consider stepping in to correct this.The PR uses a different coding style, making universal constraints universally applicable. The above would be written instead in this manner:
Any usage of
core:name
should adhere tocore:name-subjects-shape
. The anonymous property shape inUcoObject
implements two things: (1) an association ofUcoObject
withcore:name
, and (2) a more stringent constraint-set than the universal constraint-set. (This example happens to require (1) more than (2).)The reason for using this IRI-named-shape coding style is CO does not have directly encoded class-property associations for several of the relevant properties. Some of the class-property associations are inferrable via
rdfs:domain
statements and RDFS inferencing (or, in some cases, OWL inferencing).Other new-to-UCO SHACL coding styles were found necessary to include. One property is defined with a range of the complement of a named class, necessitating a
sh:not
. One property restricts a value with one level of path-indirection (firstItem
must refer to an object with nopreviousItem
), necessitating a two-membersh:path
list (seeuco-co:firstItem-subjects-previousItem-shape
).All of the above may cause challenges with CASE and UCO's current selection of a documentation generator, and possibly any documentation generator currently available. Some of the above code styles can be rolled back to use UCO's current style (even if the coding ends up redundant), but others do not have more elementary forms available that meet the same level of expressivity.
Risk 5 - Non-suppport of OWL features
The accompanying PR intentionally does not implement support some OWL features pertaining to inferencing. Primarily, this is in handling of identity resolution and some properties that are designated
owl:FunctionalProperty
s. (If a propertyP
is functional, then a graph withS P T1
andS P T2
would cause an inference thatT1 owl:sameAs T2
. It is likely the fully correct test for SHACL validation of aowl:FunctionalProperty
, after OWL inferencing is applied, would need to rely on SHACL-SPARQL. Such a shape for a propertyex:p
would be:(Reminder: Select queries in SHACL-SPARQL find all violations of a shape.)
The accompanying PR chooses to assume OWL inferencing is not in use, on the (untested) assumption that such a query would be expensive for end users to run in their SHACL validation. Instead,
sh:maxCount 1
is constrained on allowl:FunctionalProperty
s. Any community members interested in OWL inference evaluation should feel encouraged to propose implementing the SHACL-SPARQL pattern in the future. Alternatively, they could be included in the proposal PR, withsh:deactivated
applied to keep the tests disabled unless thesh:deactivated
statement were deleted by a review process willing to pay the analysis time cost.Risk 6 - Conflict with Facet strategy
There is significant potential for confusion, due to UCO's usage of
Facet
s, when reviewing what should be the subclass of aco:List
(or any externally-developed list). The proposer believes this is best viewed as an opportunity to review elementary UCO design that has to date remained unchanged, and unchallenged, since the prototype days. In particular, why is this the pattern to attach a "Set member" to acore:ContextualCompilation
:while this is the pattern to attach a "List member" (without ordering) to the current implementation of
observable:MessageThread
?Risk 7 - co:element and property chain axioms
Consider the node
kb:contextual-compilation-1
defined in Risk 6. There is an equivalent property in CO,co:element
, that could be used to define a similar structure:This is consistent with CO in terms of OWL constraints, and consistent with UCO in form of JSON-LD data. However,
co:element
is defined as a property chain axiom:That
owl:propertyChainAxiom
statement means, ifa :element b
, then there exists somec
such thata :item c
andc :itemContent b
, and domains and ranges of:item
and:itemContent
would infer additional characteristics aboutc
.An OWL inferencing application might take instances of
:element
and use them to infer and/or require the existence of a node satisfying theb
form. It's possible (the proposer is uncertain) that if such ab
already existed in the graph as a named node, no new node would be generated; it is also possible a blank node would always be generated, and later resolved asowl:sameAs
the named node. If the latter is the case, it is unclear whether SHACL-SPARQL would be needed as with theowl:FunctionalProperty
discussion noted in Risk 5.Due to needing to understand some OWL-SHACL interactions better, this proposal leaves validating
co:element
with SHACL as out of scope. It should be considered if UCO'score:ContextualCompilation
(or some superclass) would be somehow aligned withco:Collection
.Risk 8 - Open question on redistribution of imported ontologies
It is not yet decided in the accompanying Pull Request whether the Collections Ontology, in whole or in part, would be "compiled" into the monolithic UCO ontology. One axiom needed to make SHACL function was copied and cited as copied within the CO SHACL implementation, due to being needed for some SHACL functionality. Should the entirety of CO (as tracked in a Git submodule) be copied into the monolithic build?
Risk 9 - Increased reliance on tooling support
A substantial amount of CASE example data has been generable by hand (that is, by a person rather than a program), at scales that can produce sufficient illustration of concepts.
co:List
, as a doubly-linked list in RDF, is sufficiently cumbersome to write that programming support becomes more necessary to generate hand-written examples, which implies a need for library functions for developers.Competencies demonstrated
Competency 1
A set of
UcoObject
s needs to behave as an ordered list, which is known to be complete, and has no ordering key other than insertion order within this list.Competency Question 1.1
Can UCO represent this list?
Result 1.1
With CO, yes. See
tests/examples/co_PASS.json
, nodekb:list-1
.Competency Question 1.2
Can one of the
UcoObject
s be in the list twice? (One could use this for, say, representing a known handoff sequence of some object where someone ferries the object multiple times.)Result 1.2
Yes, this is a capability of
co:List
s, subclass ofco:Bag
(aka a multiset).Competency Question 1.3
Suppose the beginning and end of the list are known, but an item is missing from the middle. Can the order of known items still be queried?
Result 1.2
Yes. The total-ordering property
co:nextItem
records direct links between list members. The partial-ordering propertyx co:followedBy y
indicates theco:ListItem
y
followsx
, though after 1 or moreco:nextItem
links, exact count unknown. Seetests/examples/co_PASS.json
, nodekb:list-2
.Competency 2
The Collections Ontology is provided as a Git repository on Github.
Competency Question 2.1
What is the current version of the CO? Is this the version that UCO is tracking as a Git submodule?
Result 2.1
The current version can be seen by visiting the CO Github page. The current version, identified by Git SHA-1, is
619e7b02646321174635fd04be658e338bf7d1d7
.The version tracked by UCO can be seen with this command:
Solution suggestion
/ontology/co/co.ttl
.co.ttl
define anowl:Ontology
with IRIhttps://ontology.unifiedcyberontology.org/co
.co.ttl
's ontology import CO.sh:NodeShape
s and namedsh:PropertyShape
s inco.ttl
, using the ontology prefixhttps://ontology.unifiedcyberontology.org/co/
.co:List
characteristics, including expected errors with errors described in inlined test comments.sh:path
being a blank node.)Coordination
develop
The text was updated successfully, but these errors were encountered: