Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add NER method to suggest ENVO triad from description #3

Open
cmungall opened this issue Jul 12, 2021 · 4 comments
Open

Add NER method to suggest ENVO triad from description #3

cmungall opened this issue Jul 12, 2021 · 4 comments
Assignees

Comments

@cmungall
Copy link
Collaborator

cmungall commented Jul 12, 2021

https://github.com/cmungall/sample-annotator/tree/main/sample_annotator/text_mining

To start with, parse sample['description'], to populate sample['env_{broad_scale,local_scale,medium}'] if they are not already populated

I think this should be done by calling runner, but will need a pypi release monarch-initiative/ontorunner#9

or is it easier to just wrap oger directly for now

also for now we could just check in the nodes.tsv directly. See how we include mixs.json within the package

for now, be conservative and only use labels or exact synonyms

@cmungall
Copy link
Collaborator Author

As a first pass, just hardcode ENVO for all 3 fields regardless of package

Then for next pass, we will have a curated configuration file like this:

-
field: env_broad_scale
packages:
  - soil
termsets:
  - ontology: envo
     branches:
       - ENVO:01000254 ## environment system
     exclude_descendants_of:
       - ENVO:01001788 ##  marine ecosystem
-
field: env_local_scale
package: host-associated
termsets:
  - ontology: UBERON
...
       

that will customize which ontologies are used where

@hrshdhgd
Copy link
Collaborator

Just an FYI, OGER does not have a PyPI release either.

hrshdhgd added a commit that referenced this issue Jul 15, 2021
@hrshdhgd
Copy link
Collaborator

hrshdhgd commented Jul 15, 2021

@cmungall , how do you envision the input file coming in for NER to look like: A tsv file within the project (locally i.e. ./text_mining/data/input) or remotely located (url) ?

I'm guessing the input tsv (or db) will be generated by @turbomam through his parsing work from the large XML?

hrshdhgd added a commit that referenced this issue Jul 15, 2021
@cmungall
Copy link
Collaborator Author

I answered @hrshdhgd's questions on our 1-on1. It's clear now that he doesn't have to worry about formats, the goal is to implement functionality within the python framework all you care about is datamodel

@hrshdhgd hrshdhgd mentioned this issue Jul 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants