Skip to content

Codebase for "A data-driven framework for mapping domains of human neurobiology"

License

Notifications You must be signed in to change notification settings

changcaoyan/neuro-knowledge-engine

 
 

Repository files navigation

A data-driven framework for mapping domains of human neurobiology

Code repository for the article in Nature Neuroscience by Elizabeth Beam, Christopher Potts, Russell Poldrack, & Amit Etkin

Abstract

Functional neuroimaging has been a mainstay of human neuroscience for the past 25 years. Interpretation of fMRI data has often occurred within knowledge frameworks crafted by experts, which have the potential to amplify biases that limit the replicability of findings. Here, we employ a computational approach to derive a data-driven framework for neurobiological domains that synthesizes the texts and data of nearly 20,000 human neuroimaging articles. Across multiple levels of domain specificity, the structure-function links within domains better replicate in held- out articles than those mapped from dominant frameworks in neuroscience and psychiatry. We further show that the data-driven framework partitions the literature into modular subfields, for which domains serve as generalizable prototypes of structure-function patterns in single articles. The approach to computational ontology we present here is the most comprehensive characterization of human brain circuits quantifiable with fMRI and may be extended to synthesize other scientific literatures.

Pipelines

Data-driven framework

data-driven_pipeline

Approach to computational ontology. A data-driven framework was generated in an integrative manner in a training set of 12,708 human neuroimaging articles with brain coordinate data. First, 118 brain structures were clustered by k-means according to their co-occurrences with 1,683 terms for mental functions. The co-occurrence matrix was weighted by pointwise mutual information (PMI). Second, the top 25 terms for mental functions were assigned to each circuit based on the point-biserial correlation (rpb) of their binarized occurrences with the centroid of occurrences across structures. Third, the number of terms was selected to maximize average ROC-AUC of logistic regression classifiers predicting structure occurrences from term occurrences (forward inference) and term occurrences from structure occurrences (reverse inference) over a range of term list lengths from 5 to 25. Fourth, the number of domains was selected based on the average ROC-AUC of forward and reverse inference classifiers. Occurrences were summed across terms in each list and structures in each circuit, then thresholded by their mean across articles. In the fifth and final step, each domain was named by the mental function term with highest degree centrality of co-occurrences with other terms in the domain.

Expert-determined frameworks

expert-determined_pipeline

Approach to mapping expert-determined frameworks for brain function (RDoC) and mental illness (DSM). Seed terms from the RDoC and DSM frameworks were translated into the language of the human neuroimaging literature through a computational linguistics approach. Term embeddings of length 100 were trained using GloVe. For RDoC, embeddings were trained on a general human neuroimaging corpus of 29,828 articles (Supplementary Fig. 1b). For the DSM, embeddings were trained on a psychiatric human neuroimaging corpus of 26,070 articles (Supplementary Fig. 1c). Candidate synonyms included terms for mental functions in the case of RDoC and for both mental functions and psychopathology in the case of the DSM, as detailed in Supplementary Table 2. In the first step, the closest synonyms of seed terms were identified based on the cosine similarity of synonym term embeddings with the centroid of embeddings across seed terms in each domain. Second, the number of terms for each domain was selected to maximize cosine similarity with the centroid of seed terms. Third, the mental function term lists for each domain were mapped onto brain circuits based on positive pointwise mutual information (PPMI) of term and structure co-occurrences across the corpus of 18,155 articles with activation coordinate data (Supplementary Fig. 1a). Structures were included in the circuit if the FDR of the observed PPMI was less than 0.01, determined by comparison to a null distribution generated by shuffling term list features over 10,000 iterations.

Index of Figures

Main Text

Figure Files
1b ontology/ontol_data-driven_lr.ipynb, ontology/ontology.py
1c partition/part_splits.ipynb, partition/partition.py
1d modularity/mod_kvals_lr.ipynb
1e prototype/proto_kvals_lr.ipynb
2a ontology/ontol_data-driven_lr.ipynb
2b prediction/comp_frameworks_lr_k*.ipynb, modularity/comp_frameworks_lr_k*.ipynb, prototype/comp_frameworks_lr_k*.ipynb
2c hierarchy/hier_data-driven_lr_k6-8-22.ipynb
3b ontology/ontol_rdoc.ipynb, ontology/ontology.py
4a ontology/ontol_rdoc.ipynb, ontol_sim_lr.ipynb, ontology/ontology.py
4b ontology/ontol_data-driven_lr.ipynb, ontol_sim_lr.ipynb, ontology/ontology.py
4c ontology/ontol_ontol_dsm.ipynb, ontol_sim_lr.ipynb, ontology/ontology.py
5b, e prediction/pred_data-driven_lr.ipynb, prediction/logistic_regression/prediction.py, prediction/evaluation.py
5c, f prediction/pred_rdoc.ipynb, prediction/logistic_regression/prediction.py, prediction/evaluation.py
5d, g prediction/pred_dsm.ipynb, prediction/logistic_regression/prediction.py, prediction/evaluation.py
5h prediction/comp_frameworks_lr.ipynb
6a-f mds/mds.ipynb, mds/mds.py
6g modularity/mod_data-driven_lr.ipynb, modularity/modularity.py
6h modularity/mod_rdoc.ipynb, modularity/modularity.py
6i modularity/mod_dsm.ipynb, modularity/modularity.py
6j modularity/comp_frameworks_lr.ipynb, modularity/modularity.py
6k prototype/proto_data-driven_lr.ipynb, prototype/prototype.py
6l prototype/proto_rdoc.ipynb, prototype/prototype.py
6m prototype/proto_dsm.ipynb, prototype/prototype.py
6n prototype/comp_frameworks_lr.ipynb, prototype/prototype.py

Extended Data

Figure Files
1 corpus/cohorts.ipynb
2-3 ontology/ontol_kvals_lr.ipynb, ontology/ontology.py
4a-b ontology/ontol_data-driven_nn.ipynb, ontology/ontology.py
4c mds/mds.ipynb, mds/mds.py
4d modularity/mod_data-driven_nn.ipynb, modularity/modularity.py
4e prototype/proto_data-driven_nn.ipynb, prototype/prototype.py
5a ontology/ontol_data-driven_terms.ipynb, ontology/ontol_sim_terms.ipynb, ontology/ontology.py
5b-e ontology/ontol_sim_terms.ipynb
6a, d prediction/comp_frameworks_lr_k09.ipynb
6b-c, e-f prediction/pred_data-driven_lr_k09.ipynb
6g-h partition/part_data-driven_lr_k09.ipynb, mds/mds.ipynb
6i Left modularity/comp_frameworks_lr_k09.ipynb
6i Right modularity/mod_data-driven_lr_k09.ipynb
6j Left prototype/comp_frameworks_lr_k09.ipynb
6j Right prototype/proto_data-driven_lr_k09.ipynb
7b, e prediction/pred_data-driven_lr.ipynb, prediction/logistic_regression/prediction.py, prediction/evaluation.py
7c, f prediction/pred_rdoc.ipynb, prediction/logistic_regression/prediction.py, prediction/evaluation.py
7d, g prediction/pred_dsm.ipynb, prediction/logistic_regression/prediction.py, prediction/evaluation.py
7h-j prediction/comp_frameworks_lr.ipynb
8b, e; 9b, e prediction/pred_data-driven_nn.ipynb, prediction/neural_network/sherlock/neural_network.py, prediction/evaluation.py
8c, f; 9c, f prediction/pred_rdoc.ipynb, prediction/neural_network/sherlock/neural_network.py, prediction/evaluation.py
8d, g; 9d, g prediction/pred_dsm.ipynb, prediction/neural_network/sherlock/neural_network.py, prediction/evaluation.py
8h; 9h-j prediction/comp_frameworks_nn.ipynb U
10a partition/part_data-driven_lr.ipynb, partition/partition.py
10b partition/part_rdoc.ipynb, partition/partition.py
10c partition/part_dsm.ipynb, partition/partition.py
10d-f tsne/tsne.ipynb

Supplementary Material

Figure Files
1 validation/val_brainmap_top.ipynb
2 validation/val_brainmap_sims.ipynb
3-4 ontology/ontol_kvals_nn.ipynb, ontology/ontology.py
5 stability/stab_data-driven_lr_top.ipynb
6a, d; 7a, d prediction/pred_data-driven_lr.ipynb, prediction/logistic_regression/prediction.py, prediction/evaluation.py
6b, e; 7b, e prediction/pred_rdoc.ipynb, prediction/logistic_regression/prediction.py, prediction/evaluation.py
6c, f; 7c, f prediction/pred_dsm.ipynb, prediction/logistic_regression/prediction.py, prediction/evaluation.py
6g; 7g-i prediction/comp_frameworks_lr.ipynb
8a, d; 9a, d prediction/pred_data-driven_nn.ipynb, prediction/neural_network/sherlock/neural_network.py, prediction/evaluation.py
8b, e; 9b, e prediction/pred_rdoc.ipynb, prediction/neural_network/sherlock/neural_network.py, prediction/evaluation.py
8c, f; 9c, f prediction/pred_dsm.ipynb, prediction/neural_network/sherlock/neural_network.py, prediction/evaluation.py
8g; 9g-i prediction/comp_frameworks_nn.ipynb
Table Files
1 data/data_table_coord.ipynb
2 lexicon/preproc_cogneuro.py, lexicon/preproc_psychiatry.py, lexicon/preproc_rdoc.py, lexicon/preproc_dsm.py
3 data/text/pubmed/gen_190428/query.txt, data/text/pubmed/psy_190428/query.txt
4-5 prediction/table_lr-nn.ipynb

About

Codebase for "A data-driven framework for mapping domains of human neurobiology"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 98.7%
  • Other 1.3%