Skip to content

Latest commit

 

History

History
109 lines (81 loc) · 3.92 KB

README.md

File metadata and controls

109 lines (81 loc) · 3.92 KB

SolTranNet

The official implementation of SolTranNet, whose publication is available here.

The code for creating the figures for the publication is available here.

SolTranNet is an optimized fork of the Molecule Attention Transformer, whose original paper can be found here.

Requirements

  • Python 3.6+
  • PyTorch 1.7+
  • RDKit 2017.09.1+
  • pathlib 1.0+

Soft Requirements

  • CUDA 10.1, 10.2, or 11.1

We heavily suggest installing CUDA and compiling PyTorch with it enabled to have faster models.

You can see installation instructions for CUDA here.

You can see installation instructions for PyTorch here.

Installation

Tested on Ubuntu 18.04.5, Ubuntu 20.04.2, Debian 10, Fedora 33, CentOS 8.3.2011, Windows 10, and Ubuntu 20.04.2 subsystem for Windows 10

First, install RDKit. Installation instructions are available here

After RDKit has finished installing, you can install SolTranNet via pip:

python3 -m pip install soltrannet

NOTE: This installation method often mismatches installation of PyTorch for enabling CUDA if it needs to install PyTorch as a dependency.

If you wish to do a more careful installation:

python3 -m pip install --install-option test soltrannet

This will run our unit tests to ensure that GPU-enabled torch was setup correctly, and the proper functioning of SolTranNet as a command line tool and within a python environment.

Usage

Command line tool

Upon successful pip installation, a command line tool will be installed.

To generate the predictions for SMILES provided in my_smiles.txt and store them into my_output.txt:

soltrannet my_smiles.txt my_output.txt

You can see all of the options available for the command line tool:

usage: soltrannet [-h] [--batchsize BATCHSIZE] [--cpus CPUS] [--cpu_predict] [input] [output]

Run SolTranNet aqueous solubility predictor

positional arguments:
  input                 PATH to the file containing the SMILES you wish to
                        use. Assumes the content is 1 SMILE per line.

  output                Name of the output file. Defaults to stdout.

optional arguments:
  -h, --help            show this help message and exit
  --batchsize BATCHSIZE
                        Batch size for the data loader. Defaults to 32.

  --cpus CPUS           Number of CPU cores to use for the data loader.
                        Defaults to use all available cores. Pass 0 to only
                        run on 1 CPU.

  --cpu_predict         Flag to force the predictions to be made on only the
                        CPU. Default behavior is to use GPU if available.

In a Python environment

Soltrannet also supports integration in a python3 environment

import soltrannet as stn
my_smiles=["c1ccccc1","c1ccccc1 .ignore","Cn1cnc2n(C)c(=O)n(C)c(=O)c12","[Zn+2]","[Na+].[Cl-]"]
predictions=list(stn.predict(my_smiles))

Help

Please subscribe to our slack team.

If you use SolTranNet in your work, please cite our original publication --

@article{doi:10.1021/acs.jcim.1c00331,
 author = {Francoeur, Paul G. and Koes, David R.},
 title = {SolTranNet–A Machine Learning Tool for Fast Aqueous Solubility Prediction},
 journal = {Journal of Chemical Information and Modeling},
 volume = {0},
 number = {0},
 pages = {null},
 year = {0},
 doi = {10.1021/acs.jcim.1c00331},
 note ={PMID: 34038123},

 URL = { 
        https://doi.org/10.1021/acs.jcim.1c00331
    },
 eprint = { 
        https://doi.org/10.1021/acs.jcim.1c00331
    }

}