This is our baseline Caco-2 permeability/LogD/PPB model. It is a multitask CheMeleon model trained to predict the following endpoints:

  • Caco-2 Permeability Papp A->B
  • Caco-2 Permeability Papp B->A
  • LogD
  • MPPB
  • HPPB

Check out comparative performance on the ExpansionRx dataset here: https://openadmet.ghost.io/zero-shot-expansiorx-admet-predictions/

Pre-requisites

We highly recommend you have the Anvil framework from openadmet-models installed in an environment (called openadmet-models) for ease of use and full utilization of OpenADMET's models. For full documentation, visit our website here. If you'd like to see some more examples on how to use Anvil, see our demos here.

Installation of openadmet-models

With conda

You can install openadmet-models via our GitHub package. If you want the latest development version, clone the repository and install in editable mode:

git clone git@github.com:OpenADMET/openadmet-models.git

Set up an environment using the provided files in devtools/conda-envs.

cd openadmet-models/
conda env create -f devtools/conda-envs/openadmet-models.yaml
conda activate openadmet-models
pip install -e .

If you want to use GPU acceleration, ensure you have the appropriate CUDA toolkit installed and use the openadmet-models-gpu.yaml file instead:

conda env create -f devtools/conda-envs/openadmet-models-gpu.yaml
conda activate openadmet-models
pip install -e .

With Docker

Alternatively, you can also use Docker to spin up a containerized pre-installed environment to run openadmet-models. Just be sure you are mounting the correct folder (./permeability-logd-ppb-chemeleon-baseline) where you've downloaded the model.

If you're using a gpu, run:

docker run -it --user=root --rm  \
    -v ./permeability-logd-ppb-chemeleon-baseline:/home/mambauser/model:rw \
    --runtime=nvidia \
    --gpus \
    all ghcr.io/openadmet/openadmet-models:main

Otherwise, for cpu only:

docker run -it --user=root --rm  \
    -v ./permeability-logd-ppb-chemeleon-baseline:/home/mambauser/model:rw \
    all ghcr.io/openadmet/openadmet-models:main

IMPORTANT NOTE You will also need git lfs installed.

Downloading the model

  1. After installing Anvil, clone the model repo:
git clone https://huggingface.co/openadmet/permeability-logd-ppb-chemeleon-baseline/
  1. Change to the repo directory. Ensure you have git lfs installed for the repo and get the large model files:
git lfs install
git lfs pull
  1. You are now ready to use the model!

Using the model

IMPORTANT NOTE: This model predicts log⁑10(Papp)\log_{10}(P_{app}) values (on log⁑10(cm/s)\log_{10}(\text{cm/s})). To get PappP_{app} values in 10βˆ’6cm/s10^{-6} \text{cm/s}, simply backtransform: Papp=10y^βˆ—106 P_{app} = 10^{\hat{y}} * 10^{6}

Where y^\hat{y} is our model prediction.

We will use this model for inference, to predict endpoint values for a set of molecular compounds unseen to the model. For demonstration purposes, we will be using a small-molecule set from our recent OpenADMET-ExpansionRx challenge, provided in the file expansion_data_inference.csv. You can do this either inside the docker container as per the instructions above, or if you have installed openadmet-models on your own computer, you can use the appropriate environment.

The generic command to run our inference pipeline is:

openadmet predict \
    --input-path <the path to the data to predict on> \
    --input-col <the column of the data to predict on, often SMILES> \
    --model-dir <the anvil_training directory of the model to predict with> \
    --output-csv <the path to an output CSV to save the predictions to> \
    --accelerator <whether to use gpu or cpu, defaults to gpu>

You can run this directly in your command line, OR you can use the bash script we've provided, run_model_inference.sh.

For our working example, this command becomes:

openadmet predict \
    --input-path expansion_data_inference.csv \
    --input-col SMILES \
    --model-dir anvil_training/ \
    --output-csv predictions.csv \
    --accelerator cpu

You can easily substitute your own set of compounds, simply modify the --input-path and --input-col arguments for your specific dataset.

In our example, this outputs a file called predictions.csv which includes endpoint-specific prediction columns (as OADMET_PRED_chemprop_{}) for:

  • caco2_atob_LogPapp
  • caco2_btoa_LogPapp
  • logD
  • mppb_LogUnbound
  • hppb_LogUnbound

In this case, OADMET_STD_chemprop_{} columns are empty because uncertainty cannot be estimated unless running inference on an ensemble of models. See how to set this option here.

IMPORTANT NOTE If you'd like other examples for how to use our Anvil framework, checkout our demos here.

Downloads last month
9
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support