Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisit annotation indexing rules and consider covering sameAs, equivalentClass, equivalentProperty, others? #3

Open
amoeba opened this issue Sep 24, 2021 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@amoeba
Copy link
Contributor

amoeba commented Sep 24, 2021

@mbjones asked on Slack whether we index URIs for terms that are sameAs'd. I answered no but maybe we actually do and I just haven't seen it happen because we don't use sameAs much in our ontologies.

In our use of semantic search so far, we haven't made much use of alignment axioms (sameAs, equivalentClass, equivalentProperty) and have so far only used subClassOf (to materialize parent classes). I think the MOSAiC ontology is the first ontology we've enabled in search that uses things like NamedIndividuals and sameAs in a way we care about and I think our search isn't quite as good for MOSAiC as it is for others (like ECSO).

I think we could make a few changes to improve that and improve things for the future.

Change one: Include the class of any value URIs

For example, for the MOSAiC datasets we have in the Arctic Data Center, we inserted dataset-level annotations like

Screen Shot 2021-10-04 at 2 01 27 PM

Each of those annotations is to a NamedIndividual, rather than to a Class as we've been doing. How do we drive good searches for these? The only way right now is to search exactly for the term. But PS122/2, for example, is of class "_MOSAiC Specific Term" and "Campaign". I think it'd be nice if a person could search for PS122/2 directly but also by either of its classes. This same reasoning applies to the other two annotations in the screenshot above.

Change two: Include sameAs and equivalentClass/equivalentProperty relationships for terms

As another example, MOSAiC has a set of Research Location named individuals, like "Arctic Ocean". We didn't align this term but if we did in the future (say to http://purl.obolibrary.org/obo/GAZ_00000323), it'd be best if searches for either MOSAiC's term and GAZ's term returned documents annotated either way. This would benefit any future alignment work we do and improve searches.

Summary

Both changes, in addition to our current indexing rules, would be additive. That is, any search that works now also works with these changes. The current logic is:

  • For each annotation
    • Find all the property URIs
      • Index the property
      • Index any superproperties
    • Find all value URIs
      • Index the value
      • Index any superclasses

With the changes above, we'd get:

  • For each annotation
    • Find all the property URIs
      • Index the property
      • Index any superproperties
      • Index any properties that are equivalentPropertys
    • Find all value URIs
      • Index the value
      • Index any superclasses
      • Index any terms that are owl:sameAs
      • Index any terms that are equivalentClasss

I think this can be done by adjusting the SPARQL query we already use and I don't expect we'll have any performance issues or Solr index growth issues.

@amoeba amoeba added the bug Something isn't working label Sep 24, 2021
@amoeba amoeba self-assigned this Sep 24, 2021
@amoeba amoeba changed the title Check whether OntologyModelService returns sameAs relationships Revisit annotation indexing rules and consider covering sameAs, equivalentClass, equivalentProperty, others? Oct 4, 2021
@amoeba amoeba added enhancement New feature or request and removed bug Something isn't working labels Oct 5, 2021
@amoeba
Copy link
Contributor Author

amoeba commented Oct 7, 2021

This is going well and I'm nearly done. The query we run against every valueURI in every semantic annotation is now:

SELECT ?annotation_value_uri
WHERE
{
  {
    <$CONCEPT_URI> rdfs:subClassOf* ?annotation_value_uri .
  }
  UNION
  {
    <$CONCEPT_URI> rdf:type/rdfs:subClassOf* ?annotation_value_uri .
  }
  UNION
  {
    <$CONCEPT_URI> owl:sameAs/rdfs:subClassOf* ?annotation_value_uri .
  }
  UNION
  {
    <$CONCEPT_URI> owl:equivalentClass/rdfs:subClassOf* ?annotation_value_uri .
  }
}

This means we get some cool stuff we didn't have before. For example, if a person searches for datasets annotated with the NamedIndividual from ARCRC for "Snow Depth" (a key variable, shown in Orange), they get all of this back and searches for any of these terms return the dataset with this annotation:

image

Before, we would've just returned the the dataset if we searched directly for the NamedIndividual because we aren't expanding them at all.

The last thing I want to do is get n-degree sameAs working. For example, if A sameAs B, sameAs C, we want searches for any of A, B, or C to return A, B, and C.

@mbjones
Copy link
Member

mbjones commented Oct 7, 2021

This is excellent, @amoeba. Thanks!

Did you and @mpsaloha discuss skos:exactMatch and skos:closeMatch as candidates for this type of alignment as well? We can always add them later, but if you're making changes and reindexing things, then it might make sense. Details from SKOS:

The property skos:closeMatch is used to link two concepts that are sufficiently similar that they can be used interchangeably in some information retrieval applications. In order to avoid the possibility of "compound errors" when combining mappings across more than two concept schemes, skos:closeMatch is not declared to be a transitive property.

The property skos:exactMatch is used to link two concepts, indicating a high degree of confidence that the concepts can be used interchangeably across a wide range of information retrieval applications. skos:exactMatch is a transitive property, and is a sub-property of skos:closeMatch.

@amoeba
Copy link
Contributor Author

amoeba commented Oct 7, 2021

Thanks @mbjones, we hadn't. I'll run it by him but.

That you mention it, it seems like we oughta look at adding it in now. I don't think it'll have any practical impact at the moment because I don't think the ontologies we do query expansion on use those terms but it's probably better to put the change in now to save ourselves some time.

amoeba referenced this issue in DataONEorg/d1_cn_index_processor Oct 8, 2021
Ref #31

Also adds tests for previously added MOSAiC and ARCRC ontologies
amoeba referenced this issue in DataONEorg/d1_cn_index_processor Oct 8, 2021
@amoeba
Copy link
Contributor Author

amoeba commented Oct 8, 2021

Added support for skos:exactMatch and skos:closeMatch in DataONEorg/d1_cn_index_processor@24abd6c.

For skos:exactMatch, we treat it as transitive and symmetric. For skos:closeMatch, we don't treat it as transitive but do treat it as symmetric. This was based on suggestion from @mpsaloha and the logic makes sense and matches the quoted information from @mbjones above.

amoeba referenced this issue in DataONEorg/d1_cn_index_processor Oct 8, 2021
Ref #31

Also adds tests for previously added MOSAiC and ARCRC ontologies
amoeba referenced this issue in DataONEorg/d1_cn_index_processor Oct 8, 2021
@mbjones
Copy link
Member

mbjones commented Oct 13, 2021

LGTM.

@mbjones mbjones transferred this issue from DataONEorg/d1_cn_index_processor Jun 22, 2022
@mbjones mbjones added this to the 2.4.0 milestone Aug 5, 2022
@mbjones mbjones modified the milestones: 2.4.0, 3.0.0 Sep 6, 2022
@taojing2002 taojing2002 removed this from the 3.0.0 milestone Sep 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants