Clarify semantics aspects #71

msdemlei · 2021-10-25T09:28:22Z

Clarification of the meaning and use of semantics and content_qualifier.

This introduces a potentially document-breaking change, namely the requirement
that datalink/core concept URIs must be relative (i.e., not include the URI).
I think everyone has always done it like this, and making this guaranteed makes
it a bit simpler to correctly deal with the semantics column (actually,
both implementations I know that use values from the semantics columns
already make that assumption).

This depends on the ivoatex update that comes with PR #70 for citations
to resolve.

This, I claim, would solve Issue #67.

This introduces a potentially document-breaking change, namely the requirement that datalink/core concept URIs must be relative (i.e., not include the URI). I think everyone has always done it like this, and making this guaranteed makes it a bit simpler to correctly deal with the semantics column (actually, both implementations I know that use values from the semantics columns already make that assumption). This depends on the ivoatex update that comes with PR ivoa-std#70 for citations to resolve.

Bonnarel · 2021-10-25T09:58:48Z

Le 25/10/2021 à 11:28, msdemlei a écrit :

Clarification of the meaning and use of semantics and content_qualifier. This introduces a potentially document-breaking change, namely the requirement that datalink/core concept URIs must be relative (i.e., not include the URI). I think everyone has always done it like this, and making this guaranteed makes it a bit simpler to correctly deal with the semantics column (actually, both implementations I know that use values from the semantics columns already make that assumption).

Well. During the last DAL running meeting we apparently had a consensus the content_qualifier will mandate to have full URIs . But you were not attending Markus. That's why the initial text of the first PR #51 with content_qualifier was rewritten like in the recently merged master Your new text is going in the other direction See : https://wiki.ivoa.net/internal/IVOA/IvoaDAL_RunningMeetings/IVOA_DAL_RM12.txt

…

This depends on the ivoatex update that comes with PR #70 <#70> for citations to resolve. This, I claim, would solve Issue #67 <#67>. ------------------------------------------------------------------------ You can view, comment on, or merge this pull request online at: #71 <#71> Commit Summary * Updating Example 4.5 ("custom access data service") <586d87e> * Updating ivoatex. <2e4f430> * Clarification of the meaning and use of semantics and content_qualifier. <5ce399f> File Changes * *M* .gitignore <https://github.com/ivoa-std/DataLink/pull/71/files#diff-bc37d034bad564583790a46f19d807abfe519c5671395fd494d8cce506c42947> (2) * *M* DataLink.tex <https://github.com/ivoa-std/DataLink/pull/71/files#diff-7afbca7274a5a8b32496d79cc4cc63315fe13869b4e334b784218a2379ff4f63> (285) * *M* ivoatex <https://github.com/ivoa-std/DataLink/pull/71/files#diff-1da03da606aed8b7ca688ea31f0524476ac7b511cdad1b075aa5b9dadbe4d0f2> (2) Patch Links: * https://github.com/ivoa-std/DataLink/pull/71.patch <https://github.com/ivoa-std/DataLink/pull/71.patch> * https://github.com/ivoa-std/DataLink/pull/71.diff <https://github.com/ivoa-std/DataLink/pull/71.diff> — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#71>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AMP5LTBVMPYB5VYRIBNOKHDUIUPMHANCNFSM5GUZHUCA>. Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

msdemlei · 2021-10-25T13:25:41Z

On Mon, Oct 25, 2021 at 02:59:00AM -0700, Bonnarel wrote: Le 25/10/2021 à 11:28, msdemlei a écrit : > it a bit simpler to correctly deal with the semantics column (actually, > both implementations I know that use values from the semantics columns > already make that assumption). > Well. During the last DAL running meeting we apparently had a consensus the content_qualifier will mandate to have full URIs . But you were not attending Markus. That's why the initial text of the first PR #51 with content_qualifier was rewritten like in the recently merged master Your new text is going in the other direction See : https://wiki.ivoa.net/internal/IVOA/IvoaDAL_RunningMeetings/IVOA_DAL_RM12.txt

Hm. I wonder what Pat's concerns about "undesirable usage" were. I have no *strong* opinion either way, but if I had been at that telecon, I'd have said: (a) Well, it *would* be nicer if content_qualifier worked the same way as semantics; it's certainly a bit odd that vocabularies are used in two different ways in the same standard. (b) Having a standard vocabulary increases the chances that people will actually do the right thing and take terms from it rather than just dump it random URIs that no client at all will understand (in which case it's not machine readable, which kind of defeats the purpose). (c) Nobody wants to have long URIs when short words would do most of the time, which, I think, for content_qualifier is a reasonable expectation (though I admit I'm not sure what use cases for the long URIs are there). (d) Comparing URIs (whose schemes and perhaps authority parts are supposed to be case-insensitive, with path parts and fragment identifiers quite certainly case-sensitive) is a huge pain. Let's spare normal clients that pain. If the other authors say they've weighed those points and found them outweighed by whatever concern brought up the full URL thing, I'd still like the changes to semantics in; and the content_qualifier text could then probably be something like Where applicable, concepts from the vocabulary http://www.ivoa.net/rdf/product-type should be chosen. In contrast to the semantics column, content_qualifier must always contain full concept URIs, regardless of whether URIs point into product-type or somewhere else. As in the semantics case, non-IVOA concept URIs may be used. Again, they should resolve to human-readable definitions of the meaning and intended usage of the concept. As an example, a light curve service might link to a spectrum of the object by using #counterpart in the semantics column and http://www.ivoa.net/rdf/product-type#spectrum in content_qualifier. Is that preferable to the proponents of full URIs here? Given it's a bit odd to have two different recipes, I think it would be great if someone could donate a rationale for the difference (I can't write that because I don't see a good reason).

pdowler · 2021-10-25T16:07:14Z

IIRC, the "undesirable usage" was that if you can use bare product-type terms like "spectrum" and we allow terms from other vocabs, people might use bare terms from them as well, in which case it's just a column where you can put any one word.

I think content_qualifier is a little different than semantics: my understanding of using fully qualified URIs in semantics was that it was for a custom term (extension) but still in the same vocabulary (best example is our #thumbnail child of #preview -- extension of datalink/core rather than different vocab entirely). I don't recall off-hand how an rdf doc says it is an extension of another, but if that's possible I would expect any custom FQ term in semantics to be in a vocab that extends datalink/core. That's not true of content-qualifier

Anyway, looking at the current text now, it isn't as clear/explicit as I thought and the above from Markus looks fine to me, but I wonder if I am still reading something different into it. I think that content_qualifier could contain URIs from UAT or SimDM or whatever, not just standard product-type and custom extensions, because not all links are "to data products".

msdemlei · 2021-10-26T12:01:06Z

On Mon, Oct 25, 2021 at 09:08:04AM -0700, Patrick Dowler wrote: IIRC, the "undesirable usage" was that if you can use bare product-type terms like "spectrum" and we allow terms from other vocabs, people might use bare terms from them as well, in which case it's just a column where you can put any one word.

While I'm sure people will put all kind of junk into the field as long as clients don't do anything sensible with it, I think the hash "marker" has worked quite well as an indicator that you're not supposed to put any old junk into semantics.

I think content_qualifier is a little different than semantics: my understanding of using fully qualified URIs in semantics was that it was for a custom term (extension) but still in the same vocabulary (best example is our #thumbnail child of #preview -- extension of datalink/core rather than different vocab entirely). I

No. RDF as such doesn't have much of a notion of a "vocabulary"; it just gives rules for interpreting triples of URIs, and is rather relaxed about how to group these URIs. By giving rules for how RDF resource URIs in the VO schould look like, we in the VO have our specific idea of what "a" vocabulary is; it's basically all the concepts in one of our RDF/desise files. If you write some URI not starting with "the" vocabulary URI, the corresponding concept is not "in" that vocabulary. But really, that distinction only has practical relevance only insofar as clients can be expected to do smart things with terms in the vocabulary (because they can easily retrieve label, description, and relationships for them), while for now they can't do that for anything else, whether or not these concepts are supposed to be related to concepts in the "core" vocabulary. We *could* in Vocabularies 2.1 give rules for how people could host their own IVOA-compliant vocabularies and how clients should deal with them. But I didn't do that in 2.0 on purpose: It'll be hard enough to make clients pick up our Semantics tech without the vagaries of having to pull stuff from all over the web and having to deal with... loosely... curated semantic resources.

don't recall off-hand how an rdf doc says it is an extension of another, but if that's possible I would expect any custom FQ term in semantics to be in a vocab that extends datalink/core. That's not true of content-qualifier

Again, no, there is no formal or informal requirement that some custom concept you put into semantics has any relationship to something in datalink/core, and indeed there is no defined way to declare such relationships.

Anyway, looking at the current text now, it isn't as clear/explicit as I thought and **the above from Markus looks fine to me**, but I wonder if I am still reading something different into it. I think that content_qualifier could contain URIs from UAT or SimDM or whatever, not just standard product-type and custom extensions, because not all links are "to data products".

It certainly would help if we had a clear scenario for that, ideally of the form: "A datalink service operator wants to declare X on Y so that a client does Z. They therefore put the URI of concept X' from Vocabulary V into content_qualifier." Does such a thing exist somewhere? Does anyone perhaps even do that already? I'd gladly amend the PR with text in that direction (also for my own sake, because so far I find all of that so cloudy that I wonder if I can consider the implementation requirement as satisfied for content_qualifier in its current shape) -- and it might even provide enough of a rationale for handling content_qualifier differently from semantics in case we really want to go back to the no-default-vocabulary text.

Bonnarel · 2021-10-26T12:29:37Z

On Mon, Oct 25, 2021 at 02:59:00AM -0700, Bonnarel wrote: Le 25/10/2021 à 11:28, msdemlei a écrit : > it a bit simpler to correctly deal with the semantics column (actually, > both implementations I know that use values from the semantics columns > already make that assumption). > Well. During the last DAL running meeting we apparently had a consensus the content_qualifier will mandate to have full URIs . But you were not attending Markus. That's why the initial text of the first PR #51 with content_qualifier was rewritten like in the recently merged master Your new text is going in the other direction See : https://wiki.ivoa.net/internal/IVOA/IvoaDAL_RunningMeetings/IVOA_DAL_RM12.txt
Hm. I wonder what Pat's concerns about "undesirable usage" were. I have no strong opinion either way, but if I had been at that telecon, I'd have said: (a) Well, it would be nicer if content_qualifier worked the same way as semantics; it's certainly a bit odd that vocabularies are used in two different ways in the same standard. (b) Having a standard vocabulary increases the chances that people will actually do the right thing and take terms from it rather than just dump it random URIs that no client at all will understand (in which case it's not machine readable, which kind of defeats the purpose). (c) Nobody wants to have long URIs when short words would do most of the time, which, I think, for content_qualifier is a reasonable expectation (though I admit I'm not sure what use cases for the long URIs are there). (d) Comparing URIs (whose schemes and perhaps authority parts are supposed to be case-insensitive, with path parts and fragment identifiers quite certainly case-sensitive) is a huge pain. Let's spare normal clients that pain. If the other authors say they've weighed those points and found them outweighed by whatever concern brought up the full URL thing, I'd still like the changes to semantics in; and the content_qualifier text could then probably be something like Where applicable, concepts from the vocabulary http://www.ivoa.net/rdf/product-type should be chosen. In contrast to the semantics column, content_qualifier must always contain full concept URIs, regardless of whether URIs point into product-type or somewhere else. As in the semantics case, non-IVOA concept URIs may be used. Again, they should resolve to human-readable definitions of the meaning and intended usage of the concept. As an example, a light curve service might link to a spectrum of the object by using #counterpart in the semantics column and http://www.ivoa.net/rdf/product-type#spectrum in content_qualifier.

+1. I definitely prefer this version than the one in PR #71 and than the initial one I wrote

Is that preferable to the proponents of full URIs here? Given it's a bit odd to have two different recipes, I think it would be great if someone could donate a rationale for the difference (I can't write that because I don't see a good reason).

We don't want to "close the future" by giving a special rule in favor of data-product vocabulary.
Imagine in the case of "semantics=documentation" we want to specify if it's simple free description, refereed paper, or conference proceedings paper. content_qualifier would be the right place to specify that I think.
We may imagine having a standard vocabulary for "documents and papers" in the future.

msdemlei · 2021-10-26T15:31:52Z

On Tue, Oct 26, 2021 at 05:29:48AM -0700, Bonnarel wrote: > Again, they should resolve to human-readable definitions of the > meaning and intended usage of the concept. As an example, a light > curve service might link to a spectrum of the object by using > #counterpart in the semantics column and > http://www.ivoa.net/rdf/product-type#spectrum in > content_qualifier. +1. I definitely prefer this version than the one in PR #71 and than the initial one I wrote > Is that preferable to the proponents of full URIs here? Given > it's a bit odd to have two different recipes, I think it would be > great if someone could donate a rationale for the difference (I > can't write that because I don't see a good reason). We don't want to "close the future" by giving a special rule in favor of data-product vocabulary.

Well -- we don't in either case, so that doesn't help the decision. In both cases, people can use arbitrary concept URIs. The question at hand is: "Do we want to have two different ways of dealing with vocabularies in one standard because there is an overriding reason?" And my request was to try and figure out what the overriding reason back in the DAL running meeting was, because I'd prefer to explain these reasons if we do have them.

Imagine in the case of "semantics=documentation" we want to specify if it's simple free description, refereed paper, or conference proceedings paper. content_qualifier would be the right place to specify that I think. We may imagine having a standard vocabulary for "documents and papers" in the future.

Sure. But whether or not we define a standard vocabulary for the one clear use case now, people doing this later would be writing http://www.ivoa.net/rdf/documentation-type#refereed-paper (say). There's simply no difference to them. The difference is for people who have "data products" -- for them, it's writing #spectrum vs. http://www.ivoa.net/rdf/product-type#spectrum. And it's perhaps with implementors who try to make something with content_qualifier and who with just #spectrum have a slightly simpler time (e.g., no headache as to whether or not a part of the string needs to be compared case-insensitively). Which doesn't make a *big* difference, but I'd not want to make people write the noticibly more unwieldy full URIs and deal with the difference to semantics just because of some misunderstanding.

pdowler · 2021-10-27T18:16:54Z

hmmm. Since RDF has no notion of a vocabulary and therefore an extension, if I use http://www.opencadc.org/rdf/foo#bag in content_qualifier there is no implied sense that this is a custom product-type or a custom astronomical object type or anything else. It's just a word with a definition... by putting it into content_qualifier I'm saying "the thing at the end of this link is a bag".

Substitute http://ivoa.net/rdf/vospace#container for bag and it would be a real use case; also content_type text/xml would not convey enough information. Also, we could drop the RFE for VOTable to allow content param in the mimetype and just put #datalink into content_qualifier for recursive datalink.

The other aspect where short #term and full http://ivoa.net/rdf/{vocab}#term comes into play for me is the VEP process. I had been (in semantics) using FQ uris for prototype terms, but VEP requires that the term be demonstrated in use. That's manageable for me because the terms are in s/w, not (eg) in the database directly. But I wonder: if using a new term is as simple as create VEP && start using term (and be prepared to change use, of course) then that removes one use of FQ uris. How bad would it be if we said that any term in any ivoa vocab could be used in short form? That seems like it would cover > 98% of use cases. And I could see making a service to resolve #term to http://ivoa.net/rdf/{vocab}#term (which in principle would have to allow for multiple returns in some cases).

If this doesn't sound crazy, why not allow it? s/w will still only do things automatically if it recognises the #term.

msdemlei · 2021-10-28T08:37:54Z

On Wed, Oct 27, 2021 at 11:17:05AM -0700, Patrick Dowler wrote: hmmm. Since RDF has no notion of a vocabulary and therefore an extension, if I use `http://www.opencadc.org/rdf/foo#bag` in `content_qualifier` there is no implied sense that this is a custom product-type or a custom astronomical object type or anything else.

Not by RDF itself, and not by current VocInVO. But that is, really, the reason why I suspect we're doing our client writers a favour if we say "get vocabulary X and try to interpret the terms that way, while being graceful when there's a full URI and hence the thing is not in X". Only with that vocabulary can clients do all the magic of inserting labels and exploiting hierarchy at least for the well-known terms. We *can*, if we really need it, expand this to "voabulary X and Y" (for very few vocabularies, because in consequence these must be checked for identifier clashes). Or we can say "also get vocabulary Y, but be aware that concepts from that will always come as full URIs" (which I'd recommend). And of course there's some value in doing "custom contracts" between services and specialised clients using "singleton" concept URIs as in your vospace#container example -- but as long as we don't require clients to pull semantic resources from all over the net (and I'm sure we don't want that), once you put in arbitrary URIs, 90% of the magic is gone.

Substitute `http://ivoa.net/rdf/vospace#container` for `bag` and it would be a real use case; also content_type `text/xml` would not convey enough information. Also, we could drop the RFE for VOTable to allow content param in the mimetype and just put `#datalink` into content_qualifier for recursive datalink.

Hm... Do we do clients a favour if we do that? Suppose I have an object, and there's a spectrum and a time series attached to it, both of which are described through datalink documents. Wouldn't a client still want to know whether to send the link to a spectral or a time series client? This would be different if we expected generic "datalink clients". But this is becoming so speculative that I'd suggest we ought to wait until someone actually wants to do anything like that. And why they want that.

The other aspect where short `#term` and full `http://ivoa.net/rdf/{vocab}#term` comes into play for me is the VEP process. I had been (in semantics) using FQ uris for prototype terms, but VEP requires that the term be demonstrated in use. That's manageable for me because the terms are in s/w, not (eg) in the database directly. But I wonder: if using a new term is as simple as `create VEP && start using term` (and be prepared to change use, of course) then that removes one use of FQ uris. How

Right. That was the intent.

bad would it be if we said that any term in any ivoa vocab could be used in short form? That seems like it would cover > 98% of use

No, that won't work. A client cannot be expected to pull all the vocabularies to figure out its label, descripion, and relationships, and I certainly don't want to require that different vocabularies cannot use the same identifier.

cases. And I could see making a service to resolve `#term` to `http://ivoa.net/rdf/{v}#term` (which in principle would have to allow for multiple returns in some cases).

...in which case a client is totally in the rain. Would it show all the labels? Guess which relationships to use? Also, that service would again require clients to access network resources while doing semantics, which I'm sure we want to avoid if at all possible. Frankly: My impression is that this discussion is another instance of where we introduce a feature with the server side in mind, and as long as no client actually consumes the stuff, and there are hazy additional use cases in the air, it's really hard to pin down requirements and limitations. Which makes it really hard to know what will make the lives of future clients hard and what wouldn't. Given that situation, I'd again say "let's concentrate on the use case we understand to a certain degree and make that work well". That's the "find an appropriate SAMP client", and for that, it's reasonable to recommend to clients "Get product-type and work with it; but be aware that there can be other stuff in that field". It's kind of working for semantics, and I've not yet seen a reason in this discussion why it shouldn't work for content_qualifier.

DataLink.tex

Corresponds to changes from issue ivoa-std#67 and PR ivoa-std#71.

msdemlei mentioned this pull request Oct 25, 2021

product-type vocabulary URI #67

Closed

pdowler requested changes Nov 3, 2021

View reviewed changes

DataLink.tex Outdated Show resolved Hide resolved

pdowler added 2 commits November 2, 2021 18:39

Merge branch 'master' into clarify-semantics-aspects

c5b311b

fix typo

7062756

pdowler approved these changes Nov 3, 2021

View reviewed changes

pdowler merged commit 78ba495 into ivoa-std:master Nov 3, 2021

mbtaylor added a commit to mbtaylor/DataLink that referenced this pull request Nov 5, 2021

add "clarified use of semantics" to change log

f9b739c

Corresponds to changes from issue ivoa-std#67 and PR ivoa-std#71.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify semantics aspects #71

Clarify semantics aspects #71

msdemlei commented Oct 25, 2021

Bonnarel commented Oct 25, 2021 via email

msdemlei commented Oct 25, 2021 via email

pdowler commented Oct 25, 2021

msdemlei commented Oct 26, 2021 via email

Bonnarel commented Oct 26, 2021

msdemlei commented Oct 26, 2021 via email

pdowler commented Oct 27, 2021 •

edited

Loading

msdemlei commented Oct 28, 2021 via email

Clarify semantics aspects #71

Clarify semantics aspects #71

Conversation

msdemlei commented Oct 25, 2021

Bonnarel commented Oct 25, 2021 via email

msdemlei commented Oct 25, 2021 via email

pdowler commented Oct 25, 2021

msdemlei commented Oct 26, 2021 via email

Bonnarel commented Oct 26, 2021

msdemlei commented Oct 26, 2021 via email

pdowler commented Oct 27, 2021 • edited Loading

msdemlei commented Oct 28, 2021 via email

pdowler commented Oct 27, 2021 •

edited

Loading