From ff0a61e7d416fa5cd2cf8b3ea2e96cae3fb6f856 Mon Sep 17 00:00:00 2001 From: Steven Cannon Date: Mon, 23 Sep 2024 14:21:08 -0500 Subject: [PATCH] Update README.md --- README.md | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 06e3ab3..6327803 100644 --- a/README.md +++ b/README.md @@ -9,14 +9,17 @@ Those files are the following - with "gensp" being the abbreviation for the pres A traits file corresponds with a publication, named with the pattern `Author_Author_YEAR.yml`, is produced by a curator, and represents minimal essential information about a gene and its function as described by literature cited in the file. -Periodically, the collection of yaml files in a Genus/species/studies directory will be combined and processed to produce a **gensp.traits.yml** file that will go into the datastore, for example into `Glycine/max/gene_functions/`. The processing for addition to the datastore is, however, separate from the basic curation process. +Periodically, the collection of yaml files in a Genus/species/studies directory will be combined and processed to produce a **gensp.traits.yml** file that will go into the datastore, for example into `Glycine/max/gene_functions/`. The processing for addition of gene function information to the datastore is, however, separate from the basic curation process. + +
+More about generation of files for the datastore (advanced) ... - More about generation of files for the datastore ... The **gensp.citations.txt** file is generated by the script **get_citations.pl** (in the [scripts directory](https://github.com/legumeinfo/gene-function-registry/tree/main/scripts) of the gene-function-registry repository), which takes gensp.traits.yml as input. This file has five fields: DOI, PubMedID, PubMedCentralID, Author-Author-Year, and full citation. (\*Note that the **get_citations.pl** script can help fill in reference elements in gensp.traits.yml -- specifically, adding doi given the pmid, or the pmid given the doi.) The **gensp.references.txt** file is generated by the script **get_references.pl**, which takes the gensp.citations.txt as input. This file has the [MEDLINE-format](https://www.nlm.nih.gov/bsd/mms/medlineelements.html) publication information (authors, title, abstract, etc.) for the citations in gensp.citations.txt. The traits.yml file contains one or more yaml "documents", indicated by three leading dashes (`---`) at the top of each document. Each holds information about one gene with experimentally-established function or trait association. A document might also be thought of as a "function card", with information about one gene for which a phenotypic effect has been established. +
## Curation and review process