Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create lexmatch using oaklib #44

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -64,4 +64,7 @@ src/scripts/pattern-matches/matches_old/
src/scripts/pattern-matches/generic_matches/
src/scripts/pattern-matches/upheno_matches/
src/scripts/pattern-matches/upheno_patterns/
src/scripts/pattern-matches/ontologies/
src/scripts/pattern-matches/ontologies/
src/ontology/.template.db
hrshdhgd marked this conversation as resolved.
Show resolved Hide resolved
src/ontology/tmp/upheno_all_with_relations.*
src/ontology/tmp/upheno.db.lexical.yaml
17,270 changes: 17,270 additions & 0 deletions mappings/upheno-lexical.sssom.tsv

Large diffs are not rendered by default.

75 changes: 75 additions & 0 deletions src/ontology/config/upheno-match-rules.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
rules:
- description: default
postconditions:
predicate_id: skos:closeMatch
weight: 0.0

- description: exact to exact
preconditions:
subject_match_field_one_of:
- oio:hasExactSynonym
- rdfs:label
- skos:prefLabel
object_match_field_one_of:
- oio:hasExactSynonym
- rdfs:label
- skos:prefLabel
postconditions:
predicate_id: skos:exactMatch
weight: 2.0

- description: >-
label to label; note this is additive with the exact to exact rule,
so the score just represents an additional small boost
preconditions:
subject_match_field_one_of:
- rdfs:label
object_match_field_one_of:
- rdfs:label
postconditions:
predicate_id: skos:exactMatch
weight: 0.5

- description: xref match
preconditions:
subject_match_field_one_of:
- oio:hasDbXref
- skos:exactMatch
object_match_field_one_of:
- oio:hasDbXref
- skos:exactMatch
postconditions:
predicate_id: skos:exactMatch
weight: 4.0

- preconditions:
subject_match_field_one_of:
- oio:hasExactSynonym
- rdfs:label
object_match_field_one_of:
- oio:hasBroadSynonym
postconditions:
predicate_id: skos:broadMatch
weight: 2.0

- preconditions:
subject_match_field_one_of:
- oio:hasExactSynonym
- rdfs:label
object_match_field_one_of:
- oio:hasNarrowSynonym
postconditions:
predicate_id: skos:narrowMatch
weight: 2.0

- synonymizer:
the_rule: Remove parentheses bound info from the label.
match: r'\([^)]*\)'
match_scope: "*"
replacement: ""

- synonymizer:
the_rule: Remove box brackets bound info from the label.
match: r'\[[^)]*\]'
match_scope: "*"
replacement: ""
40 changes: 40 additions & 0 deletions src/ontology/metadata/upheno.sssom.config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
curie_map:
MP: http://purl.obolibrary.org/obo/MP_
HP: http://purl.obolibrary.org/obo/HP_
WBPhenotype: http://purl.obolibrary.org/obo/WBPhenotype_
XPO: http://purl.obolibrary.org/obo/XPO_
PLANP: http://purl.obolibrary.org/obo/PLANP_
ZP: http://purl.obolibrary.org/obo/ZP_
DPO: http://purl.obolibrary.org/obo/FBcv_
FYPO: http://purl.obolibrary.org/obo/FYPO_
DDPHENO: http://purl.obolibrary.org/obo/DDPHENO_
phipo: http://purl.obolibrary.org/obo/PHIPO_
MGPO: http://purl.obolibrary.org/obo/MGPO_
APO: http://purl.obolibrary.org/obo/APO_
subject_prefixes:
- MP
- HP
- WBPhenotype
- XPO
- PLANP
- ZP
- DPO
hrshdhgd marked this conversation as resolved.
Show resolved Hide resolved
- FYPO
- DDPHENO
- phipo
hrshdhgd marked this conversation as resolved.
Show resolved Hide resolved
- MGPO
- APO

relations:
- oboInOwl:hasDbXref
hrshdhgd marked this conversation as resolved.
Show resolved Hide resolved
- skos:exactMatch
- skos:broadMatch
- skos:narrowMatch
- skos:closeMatch
- skos:relatedMatch

global_metadata:
subject_source: http://purl.obolibrary.org/obo/upheno_all_with_relations.owl
license: "https://creativecommons.org/publicdomain/zero/1.0/"
mapping_date: "2017-09-20"
mapping_set_id: "GLOBAL" # Apparently this is the one that matters.
hrshdhgd marked this conversation as resolved.
Show resolved Hide resolved
15 changes: 15 additions & 0 deletions src/ontology/upheno.Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -224,3 +224,18 @@ reports/phenotype_trait.sssom.tsv: $(UPHENO_RELEASE_FILE_ANALYSIS)
sed -i 's/[?]//g' $@
sed -i 's/<http:[/][/]purl[.]obolibrary[.]org[/]obo[/]/obo:/g' $@
sed -i 's/>//g' $@

#############################
#### Lexical matching #######
#############################
SCRIPTSDIR=../scripts
tmp/upheno_all_with_relations.db: tmp/upheno_all_with_relations.owl
semsql make $@

../../mappings/upheno-lexical.sssom.tsv: $(SCRIPTSDIR)/upheno-lexmatch.py
python $^ run tmp/upheno_all_with_relations.db -c metadata/upheno.sssom.config.yml -r config/upheno-match-rules.yaml --no-recreate -o $@

## ../../mappings/upheno-lexical.sssom.tsv: tmp/upheno_all_with_relations.db
## runoak -i sqlite:$< lexmatch -R config/upheno-match-rules.yaml --no-recreate -o $@

lexical_matches: ../../mappings/upheno-lexical.sssom.tsv
hrshdhgd marked this conversation as resolved.
Show resolved Hide resolved
104 changes: 104 additions & 0 deletions src/scripts/upheno-lexmatch.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
import logging
from pathlib import Path
from oaklib.resource import OntologyResource
from oaklib.implementations.sqldb.sql_implementation import SqlImplementation
from oaklib.utilities.lexical.lexical_indexer import (
create_lexical_index,
lexical_index_to_sssom,
load_mapping_rules,
save_lexical_index,
load_lexical_index
)
import sys
import click
import yaml

from sssom.constants import SUBJECT_ID, OBJECT_ID
from sssom.util import filter_prefixes
from sssom.parsers import parse_sssom_table
from sssom.writers import write_table
from sssom.io import get_metadata_and_prefix_map, filter_file

SRC = Path(__file__).resolve().parents[1]
ONTOLOGY_DIR = SRC / "ontology"
OUT_INDEX_DB = ONTOLOGY_DIR / "tmp/upheno.db.lexical.yaml"
TEMP_DIR = ONTOLOGY_DIR / "tmp"

input_argument = click.argument("input", required=True, type=click.Path())
output_option = click.option(
"-o",
"--output",
help="Path for output file.",
default=sys.stdout,
)


@click.group()
@click.option("-v", "--verbose", count=True)
@click.option("-q", "--quiet")
def main(verbose: int, quiet: bool):
"""Run the SSSOM CLI."""
logger = logging.getLogger()
if verbose >= 2:
logger.setLevel(level=logging.DEBUG)
elif verbose == 1:
logger.setLevel(level=logging.INFO)
else:
logger.setLevel(level=logging.WARNING)
if quiet:
logger.setLevel(level=logging.ERROR)


@main.command()
@input_argument
@click.option(
"-c",
"--config",
help="YAML file containing metadata.",
)
@click.option(
"-r",
"--rules",
help="Ruleset for mapping.",
)
@click.option(
"--recreate/--no-recreate",
default=True,
show_default=True,
help="if true and lexical index is specified, always recreate, otherwise load from index",
)
@output_option
def run(input: str, config: str, rules: str, recreate:bool, output: str):
# Implemented `meta` param in `lexical_index_to_sssom`

meta = get_metadata_and_prefix_map(config)
with open(config, "r") as f:
yml = yaml.safe_load(f)

prefix_of_interest = yml["subject_prefixes"]

resource = OntologyResource(slug=f"sqlite:///{Path(input).absolute()}")
oi = SqlImplementation(resource=resource)
ruleset = load_mapping_rules(rules)
syn_rules = [x.synonymizer for x in ruleset.rules if x.synonymizer]
if not recreate and Path(OUT_INDEX_DB).exists():
lexical_index = load_lexical_index(OUT_INDEX_DB)
else:
lexical_index = create_lexical_index(oi=oi, synonym_rules=syn_rules)
save_lexical_index(lexical_index, OUT_INDEX_DB)

if rules:
msdf = lexical_index_to_sssom(
oi, lexical_index, ruleset=load_mapping_rules(rules), meta=meta
)
else:
msdf = lexical_index_to_sssom(oi, lexical_index, meta=meta)

msdf.df = filter_prefixes(
df=msdf.df, filter_prefixes=prefix_of_interest, features=[SUBJECT_ID, OBJECT_ID]
)
with open(str(Path(output)), "w", encoding="utf8") as f:
write_table(msdf, f)

if __name__ == "__main__":
main()