Revisit annotation indexing rules and consider covering sameAs, equivalentClass, equivalentProperty, others? #3

amoeba · 2021-09-24T23:30:06Z

@mbjones asked on Slack whether we index URIs for terms that are sameAs'd. I answered no but maybe we actually do and I just haven't seen it happen because we don't use sameAs much in our ontologies.

In our use of semantic search so far, we haven't made much use of alignment axioms (sameAs, equivalentClass, equivalentProperty) and have so far only used subClassOf (to materialize parent classes). I think the MOSAiC ontology is the first ontology we've enabled in search that uses things like NamedIndividuals and sameAs in a way we care about and I think our search isn't quite as good for MOSAiC as it is for others (like ECSO).

I think we could make a few changes to improve that and improve things for the future.

Change one: Include the class of any value URIs

For example, for the MOSAiC datasets we have in the Arctic Data Center, we inserted dataset-level annotations like

Each of those annotations is to a NamedIndividual, rather than to a Class as we've been doing. How do we drive good searches for these? The only way right now is to search exactly for the term. But PS122/2, for example, is of class "_MOSAiC Specific Term" and "Campaign". I think it'd be nice if a person could search for PS122/2 directly but also by either of its classes. This same reasoning applies to the other two annotations in the screenshot above.

Change two: Include sameAs and equivalentClass/equivalentProperty relationships for terms

As another example, MOSAiC has a set of Research Location named individuals, like "Arctic Ocean". We didn't align this term but if we did in the future (say to http://purl.obolibrary.org/obo/GAZ_00000323), it'd be best if searches for either MOSAiC's term and GAZ's term returned documents annotated either way. This would benefit any future alignment work we do and improve searches.

Summary

Both changes, in addition to our current indexing rules, would be additive. That is, any search that works now also works with these changes. The current logic is:

For each annotation
- Find all the property URIs
  - Index the property
  - Index any superproperties
- Find all value URIs
  - Index the value
  - Index any superclasses

With the changes above, we'd get:

For each annotation
- Find all the property URIs
  - Index the property
  - Index any superproperties
  - Index any properties that are equivalentPropertys
- Find all value URIs
  - Index the value
  - Index any superclasses
  - Index any terms that are owl:sameAs
  - Index any terms that are equivalentClasss

I think this can be done by adjusting the SPARQL query we already use and I don't expect we'll have any performance issues or Solr index growth issues.

The text was updated successfully, but these errors were encountered:

amoeba · 2021-10-07T00:27:34Z

This is going well and I'm nearly done. The query we run against every valueURI in every semantic annotation is now:

SELECT ?annotation_value_uri
WHERE
{
  {
    <$CONCEPT_URI> rdfs:subClassOf* ?annotation_value_uri .
  }
  UNION
  {
    <$CONCEPT_URI> rdf:type/rdfs:subClassOf* ?annotation_value_uri .
  }
  UNION
  {
    <$CONCEPT_URI> owl:sameAs/rdfs:subClassOf* ?annotation_value_uri .
  }
  UNION
  {
    <$CONCEPT_URI> owl:equivalentClass/rdfs:subClassOf* ?annotation_value_uri .
  }
}

This means we get some cool stuff we didn't have before. For example, if a person searches for datasets annotated with the NamedIndividual from ARCRC for "Snow Depth" (a key variable, shown in Orange), they get all of this back and searches for any of these terms return the dataset with this annotation:

Before, we would've just returned the the dataset if we searched directly for the NamedIndividual because we aren't expanding them at all.

The last thing I want to do is get n-degree sameAs working. For example, if A sameAs B, sameAs C, we want searches for any of A, B, or C to return A, B, and C.

mbjones · 2021-10-07T05:50:41Z

This is excellent, @amoeba. Thanks!

Did you and @mpsaloha discuss skos:exactMatch and skos:closeMatch as candidates for this type of alignment as well? We can always add them later, but if you're making changes and reindexing things, then it might make sense. Details from SKOS:

The property skos:closeMatch is used to link two concepts that are sufficiently similar that they can be used interchangeably in some information retrieval applications. In order to avoid the possibility of "compound errors" when combining mappings across more than two concept schemes, skos:closeMatch is not declared to be a transitive property.

The property skos:exactMatch is used to link two concepts, indicating a high degree of confidence that the concepts can be used interchangeably across a wide range of information retrieval applications. skos:exactMatch is a transitive property, and is a sub-property of skos:closeMatch.

amoeba · 2021-10-07T21:22:28Z

Thanks @mbjones, we hadn't. I'll run it by him but.

That you mention it, it seems like we oughta look at adding it in now. I don't think it'll have any practical impact at the moment because I don't think the ontologies we do query expansion on use those terms but it's probably better to put the change in now to save ourselves some time.

Ref #31 Also adds tests for previously added MOSAiC and ARCRC ontologies

Ref #31

amoeba · 2021-10-08T23:16:42Z

Added support for skos:exactMatch and skos:closeMatch in DataONEorg/d1_cn_index_processor@24abd6c.

For skos:exactMatch, we treat it as transitive and symmetric. For skos:closeMatch, we don't treat it as transitive but do treat it as symmetric. This was based on suggestion from @mpsaloha and the logic makes sense and matches the quoted information from @mbjones above.

Ref #31 Also adds tests for previously added MOSAiC and ARCRC ontologies

Ref #31

mbjones · 2021-10-13T23:49:27Z

LGTM.

amoeba added the bug Something isn't working label Sep 24, 2021

amoeba self-assigned this Sep 24, 2021

amoeba changed the title ~~Check whether OntologyModelService returns sameAs relationships~~ Revisit annotation indexing rules and consider covering sameAs, equivalentClass, equivalentProperty, others? Oct 4, 2021

amoeba added enhancement New feature or request and removed bug Something isn't working labels Oct 5, 2021

amoeba referenced this issue in DataONEorg/d1_cn_index_processor Oct 8, 2021

Expand semantic annotation SPARQL query

976a812

Ref #31 Also adds tests for previously added MOSAiC and ARCRC ontologies

amoeba referenced this issue in DataONEorg/d1_cn_index_processor Oct 8, 2021

Expand semantic annotation property SPARQL query

f7c1a9d

Ref #31

amoeba referenced this issue in DataONEorg/d1_cn_index_processor Oct 8, 2021

Expand semantic annotation SPARQL query

4567b38

Ref #31 Also adds tests for previously added MOSAiC and ARCRC ontologies

amoeba referenced this issue in DataONEorg/d1_cn_index_processor Oct 8, 2021

Expand semantic annotation property SPARQL query

c8b0609

Ref #31

amoeba mentioned this issue Oct 8, 2021

Add MOSAIC, ARCRC, SENSO, ADCAD, SALMON expand query expansion rules DataONEorg/d1_cn_index_processor#33

Merged

mbjones transferred this issue from DataONEorg/d1_cn_index_processor Jun 22, 2022

mbjones added this to the 2.4.0 milestone Aug 5, 2022

mbjones modified the milestones: 2.4.0, 3.0.0 Sep 6, 2022

taojing2002 removed this from the 3.0.0 milestone Sep 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revisit annotation indexing rules and consider covering sameAs, equivalentClass, equivalentProperty, others? #3

Revisit annotation indexing rules and consider covering sameAs, equivalentClass, equivalentProperty, others? #3

amoeba commented Sep 24, 2021 •

edited

Loading

amoeba commented Oct 7, 2021 •

edited

Loading

mbjones commented Oct 7, 2021

amoeba commented Oct 7, 2021

amoeba commented Oct 8, 2021

mbjones commented Oct 13, 2021

Revisit annotation indexing rules and consider covering sameAs, equivalentClass, equivalentProperty, others? #3

Revisit annotation indexing rules and consider covering sameAs, equivalentClass, equivalentProperty, others? #3

Comments

amoeba commented Sep 24, 2021 • edited Loading

amoeba commented Oct 7, 2021 • edited Loading

mbjones commented Oct 7, 2021

amoeba commented Oct 7, 2021

amoeba commented Oct 8, 2021

mbjones commented Oct 13, 2021

amoeba commented Sep 24, 2021 •

edited

Loading

amoeba commented Oct 7, 2021 •

edited

Loading