-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revisit annotation indexing rules and consider covering sameAs, equivalentClass, equivalentProperty, others? #3
Comments
This is going well and I'm nearly done. The query we run against every SELECT ?annotation_value_uri
WHERE
{
{
<$CONCEPT_URI> rdfs:subClassOf* ?annotation_value_uri .
}
UNION
{
<$CONCEPT_URI> rdf:type/rdfs:subClassOf* ?annotation_value_uri .
}
UNION
{
<$CONCEPT_URI> owl:sameAs/rdfs:subClassOf* ?annotation_value_uri .
}
UNION
{
<$CONCEPT_URI> owl:equivalentClass/rdfs:subClassOf* ?annotation_value_uri .
}
} This means we get some cool stuff we didn't have before. For example, if a person searches for datasets annotated with the NamedIndividual from ARCRC for "Snow Depth" (a key variable, shown in Orange), they get all of this back and searches for any of these terms return the dataset with this annotation: Before, we would've just returned the the dataset if we searched directly for the NamedIndividual because we aren't expanding them at all. The last thing I want to do is get n-degree |
This is excellent, @amoeba. Thanks! Did you and @mpsaloha discuss
|
Thanks @mbjones, we hadn't. I'll run it by him but. That you mention it, it seems like we oughta look at adding it in now. I don't think it'll have any practical impact at the moment because I don't think the ontologies we do query expansion on use those terms but it's probably better to put the change in now to save ourselves some time. |
Ref #31 Also adds tests for previously added MOSAiC and ARCRC ontologies
Added support for For |
Ref #31 Also adds tests for previously added MOSAiC and ARCRC ontologies
LGTM. |
@mbjones asked on Slack whether we index URIs for terms that are sameAs'd. I answered no but maybe we actually do and I just haven't seen it happen because we don't use sameAs much in our ontologies.
In our use of semantic search so far, we haven't made much use of alignment axioms (sameAs, equivalentClass, equivalentProperty) and have so far only used subClassOf (to materialize parent classes). I think the MOSAiC ontology is the first ontology we've enabled in search that uses things like NamedIndividuals and sameAs in a way we care about and I think our search isn't quite as good for MOSAiC as it is for others (like ECSO).
I think we could make a few changes to improve that and improve things for the future.
Change one: Include the class of any value URIs
For example, for the MOSAiC datasets we have in the Arctic Data Center, we inserted dataset-level annotations like
Each of those annotations is to a NamedIndividual, rather than to a Class as we've been doing. How do we drive good searches for these? The only way right now is to search exactly for the term. But PS122/2, for example, is of class "_MOSAiC Specific Term" and "Campaign". I think it'd be nice if a person could search for PS122/2 directly but also by either of its classes. This same reasoning applies to the other two annotations in the screenshot above.
Change two: Include sameAs and equivalentClass/equivalentProperty relationships for terms
As another example, MOSAiC has a set of Research Location named individuals, like "Arctic Ocean". We didn't align this term but if we did in the future (say to http://purl.obolibrary.org/obo/GAZ_00000323), it'd be best if searches for either MOSAiC's term and GAZ's term returned documents annotated either way. This would benefit any future alignment work we do and improve searches.
Summary
Both changes, in addition to our current indexing rules, would be additive. That is, any search that works now also works with these changes. The current logic is:
With the changes above, we'd get:
equivalentProperty
sowl:sameAs
equivalentClass
sI think this can be done by adjusting the SPARQL query we already use and I don't expect we'll have any performance issues or Solr index growth issues.
The text was updated successfully, but these errors were encountered: