diff --git a/main/404.html b/main/404.html index c9a93c5..26514dd 100644 --- a/main/404.html +++ b/main/404.html @@ -14,7 +14,7 @@ - + @@ -22,7 +22,7 @@ - + @@ -158,7 +158,7 @@
- +
datamol-io/splito @@ -267,7 +267,7 @@
- +
datamol-io/splito @@ -427,6 +427,27 @@ +
  • + + + + + Lo Splitter + + + + +
  • + + + + + + + + + +
  • @@ -558,6 +579,27 @@ +
  • + + + + + splito.lohi + + + + +
  • + + + + + + + + + +
  • @@ -658,7 +700,7 @@

    404 - Not found

    - + diff --git a/main/api/lohi.html b/main/api/lohi.html new file mode 100644 index 0000000..b79fec1 --- /dev/null +++ b/main/api/lohi.html @@ -0,0 +1,1159 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + splito.lohi - Splito + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + + Skip to content + + +
    +
    + +
    + + + + + + +
    + + +
    + +
    + + + + + + + + + +
    +
    + + + +
    +
    +
    + + + + + + + +
    +
    +
    + + + +
    +
    +
    + + + +
    +
    +
    + + + +
    +
    + + + + + + + +

    splito.lohi

    + + +
    + + + +

    + splito.lohi.LoSplitter + + +

    + + +
    + + + + + +
    + + + + + + + + + + +
    + + + +

    + __init__ + + +

    +
    __init__(
    +    threshold: float = 0.4,
    +    min_cluster_size: int = 5,
    +    max_clusters: int = 50,
    +    std_threshold: float = 0.6,
    +)
    +
    + +
    + +

    A splitter that prepares data for training ML models for Lead Optimization or to guide +molecular generative models. These models must be sensitive to minor modifications of +molecules, and this splitter constructs a test that allows the evaluation of a model's +ability to distinguish those modifications.

    + + + +

    Parameters:

    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    NameTypeDescriptionDefault
    threshold + float + +
    +

    ECFP4 1024-bit Tanimoto similarity threshold. +Molecules more similar than this threshold are considered too similar and can be grouped together in one cluster.

    +
    +
    + 0.4 +
    min_cluster_size + int + +
    +

    the minimum number of molecules per cluster.

    +
    +
    + 5 +
    max_clusters + int + +
    +

    the maximum number of selected clusters. The remaining molecules go to the training set. +This can be useful for limiting your test set to get more molecules in the train set.

    +
    +
    + 50 +
    std_threshold + float + +
    +

    the lower bound of the acceptable standard deviation for a cluster's values. It should be greater than the measurement noise. +For ChEMBL-like data set it to 0.60 for logKi and 0.70 for logIC50. +Set it lower if you have a high-quality dataset.

    +
    +
    + 0.6 +
    +

    For more information, see a tutorial in the docs and Steshin 2023, Lo-Hi: Practical ML Drug Discovery Benchmark.

    + +
    + +
    + + +
    + + + +

    + split + + +

    +
    split(
    +    smiles: list[str], values: list[float], n_jobs: int = -1, verbose: int = 1
    +) -> tuple[list[int], list[list[int]]]
    +
    + +
    + +

    Split the dataset into test clusters and train.

    + + + +

    Parameters:

    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    NameTypeDescriptionDefault
    smiles + list[str] + +
    +

    list of smiles.

    +
    +
    + required +
    values + list[float] + +
    +

    list of their continuous activity values.

    +
    +
    + required +
    verbose + int + +
    +

    set to 0 to turn off progressbar.

    +
    +
    + 1 +
    + + + +

    Returns:

    + + + + + + + + + + + + + + + + + +
    Name TypeDescription
    train_idx + list[int] + +
    +

    list of train indices.

    +
    +
    clusters_idx + list[list[int]] + +
    +

    list of lists of cluster indices.

    +
    +
    + +
    + +
    + + + +
    + +
    + + +
    + + + + + + + + + + + + + +
    +
    + + + +
    + + + +
    + + + +
    +
    +
    +
    + + + + + + + + + + + + + + \ No newline at end of file diff --git a/main/api/plot.html b/main/api/plot.html index 2fd8ad6..7480683 100644 --- a/main/api/plot.html +++ b/main/api/plot.html @@ -13,12 +13,12 @@ - + - + @@ -26,7 +26,7 @@ - + @@ -167,7 +167,7 @@
    - +
  • + + + + + Lo Splitter + + + + +
  • + + + + + + + + + +
  • @@ -572,6 +593,27 @@ + + +
  • + + + + + splito.lohi + + + + +
  • + + + + + + + + @@ -825,7 +867,7 @@