-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introducing disjointedness between Information and Non-Information Resources #619
Comments
A follow-on patch will regenerate Make-managed files. References: * #619 Signed-off-by: Alex Nelson <[email protected]>
References: * #619 Signed-off-by: Alex Nelson <[email protected]>
A follow-on patch will regenerate Make-managed files. References: * #619 Signed-off-by: Alex Nelson <[email protected]>
References: * #619 Signed-off-by: Alex Nelson <[email protected]>
One point I found interesting in doing the alignments is that I think on a closer review, much of UCO could end up being considered as I have one potential subclass, which I haven't proposed separately because I've had trouble pinning where in UCO it would be. I'm providing it here just for discussion, not to augment the proposal. I think an S3 Object would be another subclass of @prefix drafting: <http://example.org/ontology/drafting/> .
<s3://digitalcorpora/corpora/dfrws/challenge-2021/1_Skimmer_mSD.zip>
a drafting:S3Object ;
. This can be seen in Digital Corpora's object browser here, which offers an alternative (downloadable) view of the S3 object via an HTTP portal. As a graph-individual, how would that S3 object be classified? I think the UCO definition, including this IR/NIR proposal, would go at least this far: drafting:S3Object
a owl:Class , sh:NodeShape ;
rdfs:subClassOf
core:InformationResource ,
observable:ObservableObject
;
. I don't think it's a subclass of |
I still do not see the justification for this complexity here and still see a fundamental invalidity at least in the example if maybe(?) not in the underlying proposed change. The validity issue is that throughout the examples and I believe implied in the proposed solution IRI/URI identifiers are used for UcoObjects that are not necessarily (an in at least one case apparently intentionally) globally unique which is an absolute requirement for CDO object identifiers. On the broader issues, I am a bit confused. I would strongly object to progressing this any further until such a working session could occur. |
@sbarnum I welcome a discussion to further explore motivations of this and some of CDO's fundamental objectives. In particular, we should discuss whether CDO can be used to extend existing concepts of other vocabularies and knowledge bases, which I believe (and I think some others believe) to be a key objective of ontological interoperability. I believe there is a significant risk if CDO avoids this objective - a risk of inducing an information silo.
|
Note
(Submitted by @plbt5 and @ajnelson-nist.)
This proposal is split off from Issue 606. This proposal does not address any of the UUID discussion from 606.
Background
UCO does not currently account for explicitly representing the distinction between a physical resource that extends in time and space, like a device or a person, and a digital (web) resource that only lives in the cyber-domain, e.g.,
<https://caseontology.org/index.html>
. Since both are clearly disjoint from each other, and because there are many objects in UCO that are either one or the other, such disjointedness must be specified case by case. See for instance Issue #536 , which partially addressed a question around a graph-individual representing a downloadable file (e.g.,<http://example.org/file.zip>
).We remind us of the distinction that has already been identified by RDF between information and non-information resources. This ended up in RFC9110 HTTP Semantics. We have depicted their application and distinctions in Figure 1 below:
Figure 1 - Information and non-information resources: their relationship and differences
(Note: For the purposes of this proposal, please consider URI and IRI as synonymous.)
The distinction between an Information Resource (IR) and Non-Information Resource (NIR) cannot be determined from the URI itself but from the response that one gets from the server. If the URI concerns a NIR the server cannot respond with data because there does not yet exists something like Elephant-Over-IP or Paul-Over-IP a.k.a. "Beam me up, Scotty" in the protocols. Instead, the server will respond with a HTTP-303 status, redirecting to a URI that is an Information Resource. Visiting the NIR thus discloses information about the NIR as opposed to the real thing itself.
This kind of behavior of the webserver leaves the determination about whether a resource is a NIR or an IR as a matter of perception by the client. For instance, some services may differentially serve a page to some users, but not others, like with an international hotel that gives its home page to in-country visitors, but a language-specific page for external-appearing visitors. This is a case where the home page is perceived as an IR to in-country visitors, and a NIR to out-of-country visitors. One graph holding perspectives from multiple geographies must be able to tolerate a resource being IR and NIR.
Meanwhile, other RDF resources encoded in a graph remain truly in a set of concepts that will never be information resources, such as people or devices. Hence, we find a need for specializing non-information resource further with a class of things that will never be information resources.
This distinction is instrumental for a lot of things that are built with RDF(S) and OWL, and it is something that UCO should at least recognize as current practice.
Requirements
Requirement 1
Allow UCO to unequivocally determine in a graph whether a resource is either never an information resource or possibly an information resource.
Requirement 2
A single web resource MUST be able to be represented as an IR and/or an NIR as appropriate at different situations, e.g., due to perception about authorization, location, specific targeting and more.
A resource can be both an IR and a NIR because it can be perceived as an IR or NIR depending on constraints or business rules as implemented by the server, e.g., serving pages in different languages when requested from different geographical locations.
(Proposal flow note: This proposal suggests a solution and competencies before providing a risk/benefit analysis.)
Solution suggestion
The implementation would first need to introduce the distinction between Non-Information and Information Resources. This would become two additional near-top-level classes, under
core:UcoThing
. This would be a nod to the concepts really being RDFS concepts, but not defined with RDFS IRIs. We should also avoid entailing RDFS semantics ofrdfs:Resource
being the top-level class, because of the tension such would create with OWL andowl:Thing
being the top-level class.Next, another distinction should be introduced to acknowledge Never-Information Resources, and these being disjoint with the IR and NIR. This allows UCO to follow the reality where an IR can change into an NIR, as explained in Competency 1.
To that end, we suggest to introduce the following concepts in UCO:
We also introduce
observable:WebResource
as a parent toobservable:WebPage
, to acknowledge web resources that are not yet known to be an IR or NIR, and, to acknowledge Webpages that are always to be considered an IR:Visually, this renders as follows, with green nodes new classes, and the red link a new disjointedness:
Apart from the above additions to UCO, we suggest to perform an initial alignment. The Risks section should make clear the benefit of such alignment, particularly pertaining to some existing practices (outside of UCO) on designating graph nodes with RDF types analogous to UCO's
identity:Person
andobservable:WebPage
. The rationale followed is - can thisowl:Thing
ever be downloaded with some browser or command-line tool?Competencies demonstrated
Competency 1
Say the webpage of a multilingual company (MC) is being accessed by two market analysts in a multinational organization, who routinely contribute to a shared knowledge base in the organization. Their offices are in different countries that happen to use languages MC supports, Japan and France. MC's default language is Japanese.
The Japanese analyst visits the home page,
https://mc.example.co.jp/
, and is served content from that URL. The French analyst visits the home page,https://mc.example.co.jp/
, and is 303-redirected tohttps://mc.example.co.jp/lang-fr/
by server-side client-geolocation rules.Neither analyst knows the other is trying to access
https://mc.example.co.jp/
.Competency Question 1.1
What are the representations of the Japanese analyst and the French analyst, using
InformationResource
,NonInformationResource
,NeverInformationResource
,WebResource
, and/orWebPage
?Result 1.1
The Japanese analyst:
The French analyst:
Even if pooled in the shared knowledge base, this total knowledge view remains consistent (i.e. does not raise SHACL validation errors).
This provides an example of a web resource that is, by differential service, contingently a
InformationResource
and/or aNonInformationResource
.Competency Question 1.2
Are the views consistent when pooled into one graph without any notes on time of observation (i.e., does not raise SHACL validation issues)?
Result 1.2
Yes. The testing in PR 610 confirms no SHACL violations are raised. The visual display of the classes and how this example doesn't hit a class-disjointedness issue is as follows (using "⊂" for subclassing (
rdfs:subClassOf
), "⋂=∅" for class-disjointedness (owl:disjointWith
), and "∈" for instantiation (rdf:type
)).Competency 2
This competency gives a scenario provided as a Risk in the first version of this proposal.
There is a user interface design option available for web services that choose to provide content for browser-based users and RDF-based users. They can choose to separate the RDF individuals from the web pages documenting those individuals; or, they can choose to provide the browser-friendly contents (i.e., HTML, maybe with graphics) describing an individual at that individual's IRI.
Suppose a personnel indexing service is deployed that uses home pages as person identifiers for an example organization. Their knowledge graph is available to a graph consumer who also uses UCO, and we assume the IR/NIR/Never-IR distinction of this proposal is adopted. This statement is in the graph provided by the service:
And,
http://example.org/~bob
, when visited in a browser, is served as HTML. A crawler used by the graph consumer logs this in its knowledge graph, after stumbling on Bob's home page through an intranet traversal:Competency Question 2.1
What encodings are possible to describe the graph-individual
<http://example.org/~bob>
?This question stems from UCO's demonstrations to date, and is presented to motivate the need for UCO to clarify its classes
URL
andWebPage
in particular.Result 2.1
<http://example.org/~bob> a observable:WebPage .
- The graph-individual pulls down in a browser as HTML. From the crawler's perspective, this is aWebPage
.<http://example.org/~bob> a identity:Person .
- The graph-individual has a type offoaf:Person
in the personnel service's graph, so it feels natural to translate that statement over to UCO'sidentity:Person
.Unfortunately, if both of those interpretations were taken, an inconsistency would be reached:
identity:Person
is undercore:NeverInformationResource
, andobservable:WebPage
is undercore:NeverInformationResource
, entailing membership in two disjoint sets.<http://example.org/~bob> a observable:URL .
- The graph-individual can be seen as describing itself. However, this is another instance of the confusion discussed in Issues How does one represent a downloadable file in UCO? #534 and File and URL should be designated disjoint classes #536 , which addressed modeling a URL that yields a file-download on visit. In Issue 536, a disjointedness betweenURL
andFile
was adopted, but several significant questions were left unaddressed.This proposal takes a step towards addressing the question of what higher-level classes should be made disjoint, rather than piecemeal assignment of some
ObservableObject
subclasses.Competency 2.2
How can the personnel indexing service's graph integrate into the UCO-based graph?
Result 2.2
There is some challenge in integrating the personnel indexing service's graph into an environment where information resources and non-information resources are held disjoint.
Integration of such a data source would need to split the resource
http://example.org/~bob
into independent entities, likely with a newidentity:Person
node. Other assertions on Bob from the personnel graph, such as name information, would likely need to migrate intoFacet
s defined in the UCOidentity:
namespace, rather than be carried over with the FOAF vocabulary. In this case, some FOAF vocabulary can still be used to preserve links.The below graph would be derived from the personnel graph, and added to the crawler's knowledge base. The personnel graph would not be directly added.
Risk / Benefit analysis
Benefits
Adding the specialization class
NeverInformationResource
moves further to realizing an assumed disjunction in RFC 9110's HTTP Semantics between "Information Resource" and "Non Information Resource". In practice,InformationResource
andNonInformationResource
can be conflated when graphs are built from multiple perspectives. This proposal prevents some conflations that should not be possible, especially ones where physical things could accidentally be implied to be downloadable.Aligning
WebPage
with a higher-level concept should bring a better understanding to how to use it. This is needed since UCO'sWebPage
andURL
can become mixed with other concepts due to the fundamental nature of RDF being about using IRIs and UCO describing URLs.observable:WebPage
has been lacking to date in UCO demonstrations, which has raised confusion in Ontology Committee calls. Chances to clarify this class should be taken.Understanding what
WebPage
is and isn't may be especially important in resolvingReactionsListFacet
from #374. A social media post is often viewable as a web page, so UCO usage could easily see something like this in some adopter's graph analyzing some (example) social network:Risks
Competency 2 illustrates a significant quality-control consideration for how to integrate data from non-UCO graphs. Agreement on fundamentals is one of the significant challenges of cross-graph interoperability.
The heuristic of "Can this ever be downloaded?" might, or might not, be a sufficient guideline for determining what would be
NeverInformationResource
s. This could be challenging for some things where records and events are closely tied together. For instance, a Bitcoin transaction has tightly-intertwined elements of (UCO)Action
s andEventRecord
s. The action is someone transferring coins, which would (by this proposal'saction:Action
alignment) be aNeverInformationResource
; however, the action doesn't fully happen without the record being anInformationResource
retrievable from the blockchain. This seems like a situation where it's tempting to say one "downloads the action," which the proposers assume is not a kind of statement UCO should wish to support. This particular "downloading the action" statement can be avoided by adding a specific disjointedness betweenaction:Action
andobservable:EventRecord
; but, the higher-order disjointedness in this proposal satisfies the same separation, stemming from actions being never-information resources, and leaving it open whether event records can be information resources.If the alignment
core:UcoInherentCharacterizationThing rdfs:subClassOf core:NeverInformationResource .
is accepted, the current statement in the ontologycore:UcoInherentCharacterizationThing rdfs:subClassOf core:UcoThing .
becomes entailed, and no longer needs to be explicitly stated from some perspectives, including with respect to SHACL, and with respect to entailment schemes (whether RDFS or OWL). However, this divide is one of the foundational statements of UCO, that there are "domain objects" (UcoObject
and subclasses) and "non-domain objects" (things that only inhere and characterize other things, and cannot exist without those other things). Removal of the triplecore:UcoInherentCharacterizationThing rdfs:subClassOf core:UcoThing .
makes this divide less apparent, becausecore:UcoInherentCharacterizationThing
is no longer among the direct subclasses ofcore:UcoThing
; but, the divide is still present from the axiomcore:UcoInherentCharacterizationThing owl:disjointWith core:UcoObject .
This appears to the proposers to be an appropriate adjustment of UCO's foundations, because UCO's foundations include design tenets of RDF.
The alignment of
core:UcoInherentCharacterizationThing
assumes so far and decides furthermore that it has no subclasses that will ever be downloadable. Were they downloadable, it seems they would be domain objects (further,ObservableObject
s) underUcoObject
. To date, it seems the only inherent characterization thing subclass that comes close to fuzzing the downloadable-or-not divide by bundling URLs isobservable:URLHistoryEntry
, but that class usesobservable:url
to andobservable:referrerURL
to separateobservable:URL
s.Visual summary
This figure illustrates the added classes and alignments. Current disjointedness axioms are also illustrated.
Coordination
develop
for the next releasedevelop
state with backwards-compatible implementation merged intodevelop-2.0.0
develop-2.0.0
develop
branch updated to track UCO's updateddevelop
branchdevelop-2.0.0
branch updated to track UCO's updateddevelop-2.0.0
branchThe text was updated successfully, but these errors were encountered: