ChemBFN: Bayesian Flow Network for Chemistry

This is the repository of the PyTorch implementation of ChemBFN model.

Features

ChemBFN provides the state-of-the-art functionalities of

SMILES or SELFIES-based de novo molecule generation
Protein sequence de novo generation
Classifier-free guidance conditional generation (single or multi-objective optimisation)
Context-guided conditional generation (inpaint)
Outstanding out-of-distribution chemical space sampling
Molecular property and activity prediction finetuning
Reaction yield prediction finetuning

in an all-in-one-model style.

News

[17/12/2024] The second paper of out-of-distribution generation is available on arxiv.org.
[31/07/2024] Paper is available on arxiv.org.
[21/07/2024] Paper was submitted to arXiv.

Usage

You can find example scripts in 📁example folder.

Pre-trained Model

You can find pretrained models in release.

Dataset Handling

We provide a Python class CSVData to handle data stored in CSV or similar format containing headers to identify the entities. The following is a quickstart.

Download your dataset file (e.g., ESOL form MoleculeNet) and split the file:

>>> from bayesianflow_for_chem.tool import split_data

>>> split_data("delaney-processed.csv", method="scaffold")

Load the split data:

>>> from bayesianflow_for_chem.data import smiles2token, collate, CSVData

>>> dataset = CSVData("delaney-processed_train.csv")
>>> dataset[0]
{'Compound ID': ['Thiophene'], 
'ESOL predicted log solubility in mols per litre': ['-2.2319999999999998'], 
'Minimum Degree': ['2'], 
'Molecular Weight': ['84.14299999999999'], 
'Number of H-Bond Donors': ['0'], 
'Number of Rings': ['1'], 
'Number of Rotatable Bonds': ['0'], 
'Polar Surface Area': ['0.0'], 
'measured log solubility in mols per litre': ['-1.33'], 
'smiles': ['c1ccsc1']}

Create a mapping function to tokenise the dataset and select values:

>>> import torch

>>> def encode(x):
...   smiles = x["smiles"][0]
...   value = [float(i) for i in x["measured log solubility in mols per litre"]]
...   return {"token": smiles2token(smiles), "value": torch.tensor(value)}

>>> dataset.map(encode)
>>> dataset[0]
{'token': tensor([  1, 151,  23, 151, 151, 154, 151,  23,   2]), 
'value': tensor([-1.3300])}

Wrap the dataset in torch.utils.data.DataLoader:

>>> dataloader = torch.utils.data.DataLoader(dataset, 32, collate_fn=collate)

Cite This Work

@misc{2024chembfn,
      title={A Bayesian Flow Network Framework for Chemistry Tasks}, 
      author={Nianze Tao and Minori Abe},
      year={2024},
      eprint={2407.20294},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2407.20294}, 
}

Out-of-distribution generation:

@misc{2024chembfn_ood,
      title={Bayesian Flow Is All You Need to Sample Out-of-Distribution Chemical Spaces}, 
      author={Nianze Tao},
      year={2024},
      eprint={2412.11439},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2412.11439}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
bayesianflow_for_chem		bayesianflow_for_chem
docs		docs
example		example
LICENSE		LICENSE
README.md		README.md
other-requirements.txt		other-requirements.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ChemBFN: Bayesian Flow Network for Chemistry

Features

News

Usage

Pre-trained Model

Dataset Handling

Cite This Work

About

Releases 1

Languages

License

Augus1999/bayesian-flow-network-for-chemistry

Folders and files

Latest commit

History

Repository files navigation

ChemBFN: Bayesian Flow Network for Chemistry

Features

News

Usage

Pre-trained Model

Dataset Handling

Cite This Work

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Languages