Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
Included generation step with pre-trained model that corrects the issue wengong-jin#47
  • Loading branch information
bsaldivaremc2 authored Jul 26, 2023
1 parent d5f76d4 commit ea6f2cd
Showing 1 changed file with 7 additions and 0 deletions.
7 changes: 7 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,13 @@ And then run `pip install .`. Additional dependency for property-guided finetuni
* For graph generation, each line of a training file is a SMILES string of a molecule
* For graph translation, each line of a training file is a pair of molecules (molA, molB) that are similar to each other but molB has better chemical properties. Please see `data/qed/train_pairs.txt`. The test file is a list of molecules to be optimized. Please see `data/qed/test.txt`.

## Molecule generation with pre-trained model:
The original vocab list does not work with the provided pre-trained model. It generates this issue: https://github.com/wengong-jin/hgraph2graph/issues/47.
I included the motifs that were causing the issue in the provided vocab list and replaced 27 less used motif pairs (seen after 800 million times.)
```
python generate.py --vocab data/chembl/recovered_vocab_2000.txt --model ckpt/chembl-pretrained/model.ckpt --nsamples 1000
```

## Molecule generation pretraining procedure
We can train a molecular language model on a large corpus of unlabeled molecules. We have uploaded a model checkpoint pre-trained on ChEMBL dataset in `ckpt/chembl-pretrained/model.ckpt`. If you wish to train your own language model, please follow the steps below:

Expand Down

0 comments on commit ea6f2cd

Please sign in to comment.