Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Authoritative IRI prefixes #457

Open
3 of 11 tasks
ajnelson-nist opened this issue Aug 18, 2022 · 3 comments
Open
3 of 11 tasks

Authoritative IRI prefixes #457

ajnelson-nist opened this issue Aug 18, 2022 · 3 comments

Comments

@ajnelson-nist
Copy link
Contributor

ajnelson-nist commented Aug 18, 2022

Background

RDF graphs tend to include a mechanism for declaring namespace prefixes. JSON-LD context dictionaries are one way of specifying a set of prefixes to be used within individual graph files.

@kfairbanks, in developing context dictionaries for UCO, has had to make an assumption about how to pick the prefixes for namespaces (i.e. uco-action for https://ontology.unifiedcyberontology.org/uco/action/). This means a decision implemented in the scope of JSON-LD context construction is the authoritative set of prefixes for UCO JSON and JSON-LD users. This feels (to the submitter) like not quite the right scope, and UCO should instead make an encoding on the ontologies as the authoritative prefix.

By luck, each of the ontologies where this is relevant (i.e. supplies a class, property, or datatype) already kind of has that encoding, in an rdfs:label. However, rdfs:label doesn't quite fit the needs:

  • It is optionally a xsd:string (without language label) or rdf:langString (bearing language label).
    • UCO currently has its ontology labels as rdf:langStrings.
  • The RDF spec provides no upper cardinality or uniqueness limit, so it could be used any number of times and would be inappropriate to constrain.
  • Nothing formally constrains whether rdfs:label should be written for people or written for automated consumption. "UCO Action Ontology"@en is a legitimate label style.

To support at least the context dictionaries, UCO should add one or both of these features to each ontology in UCO (and likewise for CASE):

  1. A skos:notation that contains the authoritative prefix, datatyped xsd:string.
  2. A SHACL sh:declare is another mechanism that enables a prefix declaration.
    However, by a consequence of the SHACL spec, this appears to be ... "virally" authoritative.

The scope of a SHACL sh:PrefixDeclaration expands beyond the ontology file in which it's declared, and is instead the entirety of importers' OWL transitive closure. SHACL Spec, 5.2.1:

"A SHACL processor collects a set of prefix mappings as the union of all individual prefix mappings that are values of the SPARQL property path sh:prefixes/owl:imports*/sh:declare of the SPARQL-based constraint or validator. If such a collection of prefix declarations contains multiple namespaces for the same value of sh:prefix, then the shapes graph is ill-formed."

sh:PrefixDeclarations are optional mechanisms that assist with SHACL-SPARQL. It is possible to avoid their use, and instead use PREFIX SPARQL statements in any SHACL-SPARQL constraints.

The proposed UCO MIME Taxonomy includes SHACL shapes that enact constraints on skos:notation. Interpretations of SHACL severity are based on informal language from the SKOS Reference document. From [Section 6.5.2], it appears there should only be one skos:notation on a concept per datatype on the notation's literal---but, the descriptive language falls short of making this a "MUST"-level requirement.

Requirements

Requirement 1

CDO ontologies (that is, each owl:Ontology) must each declare a namespace prefix for its usage.

Requirement 2

CDO ontologies must not create sh:PrefixDeclarations for any namespaces over which they do not have authority (i.e. not having domain *.unifiedcyberontology.org, *.caseontology.org, etc.).

Risk / Benefit analysis

Benefits

  1. skos:notation is meant to bear only one value per datatype.
    A. However, this is not a "MUST"-level requirement.
  2. sh:PrefixDeclaration makes an expansive stake to a prefix among downstream adopters of UCO. The expansiveness of this greatly assists with UCO settling the question of whether its prefixes could eschew the "uco-" pre-prefix: No, they shouldn't, because of conflicts with existing ontologies (such as UCO's time vs. OWL time).
  3. sh:PrefixDeclaration mimics the effect of needing to do chaining imports of JSON-LD context dictionaries.
  4. sh:PrefixDeclaration and skos:notation embrace mechanisms from existing standards, saving on design. Combining the two leads to a stronger UCO review mechanism (illustrated in Competency Question 1.2 below).

Risks

  1. Usage of skos:notation could entail UCO needing a skos:ConceptScheme in uco.ttl. This ConceptScheme would have each UCO-developed ontology as a member.
    A. While this would give the benefit of a SKOS-based uniqueness test, it would be an additional piece of technical debt passed to downstream ontologies (e.g. CASE) that would need their own ConceptScheme, largely duplicative, due to scope of authority.
  2. Adoption of a single SKOS concept does not necessarily entail that we need to import all of SKOS. Instead, the SHACL shape pertaining to skos:notation, currently housed in the UCO MIME Taxonomy, can be brought into UCO, without necessitating a owl:import of all of SKOS.
  3. Per SHACL Spec section 5.2.1, adoption of sh:declare would make any importer (using owl:imports) of UCO need to not use any prefix UCO uses for a different namespace. This led to Requirement 2.

A risk not necessarily scoped to this proposal is there is now another review step for using owl:imports statements: The OWL transitive closure needs to be reviewed for sh:declares causing conflicting prefixes. Theoretically SHACL-SHACL (SHACL used to review a SHACL graph) would handle this review of a monolithic ontology build.

Competencies demonstrated

Competency 1

A knowledge base has imported an ontology that imported CASE (which imports UCO). The knowledge base includes these statements:

<http://example.org/kb/>
	a owl:Ontology ;
	owl:imports <https://ontology.caseontology.org/case/case> ;
	sh:declare [
		sh:prefix "kb" ;
		sh:namespace "http://example.org/kb/"^^xsd:anyURI ;
	] ;
	skos:notation "kb" ;
	.

Competency Question 1.1

A user is interested in knowing what ontology prefixes are in the knowledge base.

SELECT DISTINCT ?lOntologyPrefix ?lOntologyNamespace
WHERE {
?nPrefixDeclaration
  sh:prefix ?lOntologyPrefix ;
  sh:namespace ?lOntologyNamespace ;
  .
}
ORDER BY ?lOntologyPrefix ?lOntologyIRI

Result 1.1

With the current state of UCO's develop, these would be the results:

?lOntologyPrefix ?lOntologyNamespace
kb http://example.org/kb/
owl http://www.w3.org/2002/07/owl#
sh http://www.w3.org/ns/shacl#

As a development aid, sh:prefix was used in the OWL SHACL review mechanism. This proposal would nix those shapes.

Competency Question 1.2

A user is interested in knowing what ontology prefixes are defined authoritatively in the knowledge base, by merit of having a skos:notation matching a prefix.

SELECT DISTINCT ?lOntologyPrefix ?nOntology ?lOntologyNamespace
WHERE {
?nOntology
  skos:notation ?lOntologyPrefix ;
  .
?nPrefixDeclaration
  sh:prefix ?lOntologyPrefix ;
  sh:namespace ?lOntologyNamespace ;
  .
}
ORDER BY ?lOntologyPrefix ?nOntology ?lOntologyIRI

Result 1.2

With the current state of UCO's develop, these would be the results:

?lOntologyPrefix ?nOntology ?lOntologyNamespace
kb http://example.org/kb http://example.org/kb/

(Note also the difference in trailing slash.)

On adoption of this proposal, these could be the results:

?lOntologyPrefix ?nOntology ?lOntologyNamespace
kb http://example.org/kb http://example.org/kb/
uco-action https://ontology.unifiedcyberontology.org/uco/action https://ontology.unifiedcyberontology.org/uco/action/
uco-co https://ontology.unifiedcyberontology.org/co https://ontology.unifiedcyberontology.org/co/
... ... ...
uco-owl https://ontology.unifiedcyberontology.org/owl https://ontology.unifiedcyberontology.org/owl/

Note that namespaces that do not provide concepts (classes, properties, or datatypes) currently do not seem like they would need a prefix declared. Hence, uco-master would not be given a sh:PrefixDeclaration. uco-co and uco-owl also do not provide concepts, but instead provide only shapes for existing concepts, so it's debatable whether they should be given a sh:PrefixDeclaration.

Solution suggestion

  • Remove the sh:declare block from /ontology/owl/owl.ttl.
  • For each UCO owl:Ontology that defines a class, property, or datatype (this would not include uco-co, uco-master, or uco-owl):
    • Add a sh:PrefixDeclaration, binding the prefix currently housed in rdfs:label but without the language tag.
    • (If also adopting skos:notation) Add a skos:notation, bearing the contents currently housed in rdfs:label but without the language tag.
    • Retain rdfs:label.
  • (If also adopting skos:notation) Import the skos:notation shape from the UCO MIME Taxonomy.
  • (If also adopting skos:notation) Add to UCO CI a test that sh:PrefixDeclarations within UCO are only used in "authoritative" scope.

Coordination

  • Tracking in Jira ticket OC-267
  • Administrative review completed, proposal announced to Ontology Committees (OCs) on 2022-08-18
  • Requirements to be discussed in OC meeting, 2022-08-25
  • Requirements Review vote has not occurred
  • Requirements development phase completed.
  • Solution announced to OCs on (TODO-date)
  • Solutions Approval to be discussed in OC meeting, date TBD
  • Solutions Approval vote has not occurred
  • Solutions development phase completed.
  • Implementation has not been merged into develop
  • Milestone linked
  • Documentation logged in pending release page
@sbarnum
Copy link
Contributor

sbarnum commented Aug 18, 2022

I disagree with the premise of this proposed change.
I do not believe that UCO should assert and enforce specific prefixes from within the ontology.
I believe this would add significant unnecessary complexity (as evidenced in the technical detail outlined by the submitter) and would actually cause a detrimental limitation to serializations of UCO.
Prefixes in instantial content are relevant to serializations and not to the actual ontology.
Any given serialization uses prefixes in an attempt to simplify the representation of the serialized content while enabling full expansion of those prefixes to their non-compacted deserialized form using the defined prefix mapping.
Different serializations may benefit from different prefix forms or values.
As long as the serialization defines an explicit mapping for the prefixes such that each prefixed entry can be expanded to its full IRI form there should be no issues and this expansion should always occur upon deserialization.
When merging disparate graphs or transforming from one serialization to another, the content should always be expanded to its full IRI form. This allows lossless utility while allowing individual serializations to choose prefixes most appropriate to their context.
I do not believe we should be attempting to arbitrarily choose a single prefix for all contexts and force it on all serializations.

In short, I believe that this proposed change is:

  1. functionally detrimental to the UCO community ecosystem
  2. significantly and unnecessarily complex

@ajnelson-nist
Copy link
Contributor Author

@sbarnum This was a pretty surprising proposal for me to work through. I think at least it identifies a potential behavioral bug stemming from usage of sh:declare. In fact, I think I'll just take care of that small part now with a bugfix PR.

I won't argue hard for or against this proposal, as I see technical benefits paired with non-trivial maintenance cost.

ajnelson-nist added a commit that referenced this issue Aug 19, 2022
SHACL Specification Section 5.2.1 specifies a "viral" behavior of
`sh:declare` throughout an OWL transitive closure.  This patch removes
usage of `sh:declare` as a matter of lack of authority for non-UCO
prefixes.  It just so happens the only place this was used was in the
introduction of the OWL SHACL review mechanisms of Issue 406.

A follow-on patch will regenerate Make-managed files.

References:
* #406
* #457

Signed-off-by: Alex Nelson <[email protected]>
ajnelson-nist added a commit that referenced this issue Aug 19, 2022
References:
* #406
* #457

Signed-off-by: Alex Nelson <[email protected]>
@ajnelson-nist
Copy link
Contributor Author

I'm pulling this off of today's agenda. I think the JSON-LD context can survive without this proposal.

ajnelson-nist added a commit to casework/CASE-Corpora that referenced this issue Nov 22, 2022
The new typo checking feature of `case_validate` flags usage of the IRI
serving as the `uco-core:` prefix as an unrecognized concept.

Rather than determine how to encode a definition for the prefix IRI
within the ontology, this patch removes usage of `sh:declare`, due to
downstream side-effects noted during development of UCO Issue 457.

No effects were observed on Make-managed files.

References:
* casework/CASE-Utilities-Python#77
* ucoProject/UCO#457

Signed-off-by: Alex Nelson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants