This is the official implementation of the paper BioBridge: Bridging Biomedical Foundation Models via Knowledge Graphs (ICLR 2024).
The trained BioBridge model checkpoints.
-
data/PrimeKG
: The raw PrimeKG data in.zip
format. -
data/Processed
: The node features obtained from different databases, e.g., protein's sequence. -
data/embeddings/
: The KG node embeddings extracted from unimodal FMs, such as PubMedBERT, UniMol, and ESM-2. -
data/BindData/
: The preprocessed BioBridge related data and its configurations. -
data/mouse_protein/
: The preprocessed mouse protein data and its configurations regarding the mouse protein prediction task in the paper.
The guidelines for data preprocessing.
The source code of BioBridge.
The source code of unimodal FMs, including PubMedBERT and ESM-2, for encoding node features.
The example script for training BioBridge.
The example script for using BioBridge for cross-modality prediction.
The example script for training BioBridge on the mouse protein prediction task.
The example script for testing BioBridge on prompting LLMs for molecule generation and Q&A tasks.
If you find this repository useful in your research, please cite the following paper:
@inproceedings{wang2023biobridge,
title={BioBridge: Bridging Biomedical Foundation Models via Knowledge Graphs},
author={Wang, Zifeng and Wang, Zichen and Srinivasan, Balasubramaniam and Ioannidis, Vassilis N and Rangwala, Huzefa and Anubhai, Rishita},
booktitle={International Conference on Learning Representations},
year={2024}
}