We have updated our microsomal clearance model with the latest available public data from two of our blind challenges: the ASAP-Polaris Blind Challenge and the ExpansionRx Blind Challenge. This is model is a multitask CheMeleon model trained on liver microsome data curated from ChEMBL, ASAP, and ExpansionRx.

We curated intrinsic clearance data (CLint) from three species: human (human liver microsome, or HLM), rat (rat liver microsome or RLM), and mouse (mouse liver microsome, or MLM). During training of this model, all CLint values were scaled to in vivo clearance. To ensure accurate training and predictions, be sure to check whether or not your CLint values are for in vivo or in vitro clearance. See this blog post for further details on the intricacies of microsomal clearance data.

Improvements over the baseline

As with any new model release, it is important to benchmark any changes. We've done that here with the test set from the ExpansionRx challenge. Below are regression plots of:

  1. a CheMeleon multitask model trained on curated ChEMBL data only predicting on the ExpansionRx test set
  2. a CheMeleon multitask model (with updated hyperparameters) trained on curated ChEMBL data, data from the ASAP-Polaris challenge, and train data from the ExpansionRx challenge, predicing on the ExpansionRx test set

IMPORTANT NOTE: The updated model we have released here has been trained on all the ExpansionRx data (both train and test sets) in addition to the curated ChEMBL and ASAP-Polaris datasets. For the sake of benchmarking, we have trained an analogous model that excludes only the ExpansionRx test set.

First, we compare predicting log₁₀(CLᵢₙₜ) for human liver microsomes (HLM):

ChEMBL-trained CheMeleon multitask model predicting on ExpRx test set.
ChEMBL+ASAP+ExpRx train set-trained CheMeleon multitask model predicting on ExpRx test set.

That's some considerable improvement!

We see a similar trend with mouse liver microsome (MLM) clearance:

ChEMBL-trained CheMeleon multitask model predicting on ExpRx test set.
ChEMBL+ASAP+ExpRx train set-trained CheMeleon multitask model predicting on ExpRx test set.

See instructions below to test out this updated model!

Getting Started

We highly recommend you have the Anvil framework from openadmet-models installed in an environment (called openadmet-models) for ease of use and full utilization of OpenADMET's models. For full documentation, visit our website here. If you'd like to see some more examples on how to use Anvil, see our demos here.

Option A: Downloading & running the model with Github

  1. You can install openadmet-models via our GitHub package. If you want the latest development version, clone the repository and install in editable mode:
git clone git@github.com:OpenADMET/openadmet-models.git
  1. Set up an environment using the provided files in devtools/conda-envs.
cd openadmet-models/
conda env create -f devtools/conda-envs/openadmet-models.yaml
conda activate openadmet-models
pip install -e .
  1. If you want to use GPU acceleration, ensure you have the appropriate CUDA toolkit installed and use the openadmet-models-cuda.yaml file instead:
conda env create -f devtools/conda-envs/openadmet-models-gpu.yaml
conda activate openadmet-models
pip install -e .
  1. IMPORTANT NOTE You will also need git lfs installed.

  2. After installing Anvil, clone the model repo:

git clone https://huggingface.co/openadmet/microsomal-clearance-chemeleon-baseline/
  1. Change to the repo directory. Ensure you have git lfs installed for the repo and get the large model files:
git lfs install
git lfs pull
  1. You are now ready to use the model!

Option B: Downloading & running the model with Docker

Alternatively, you can also use Docker to spin up a containerized pre-installed environment to run openadmet-models. Just be sure you are mounting the correct folder (./microsomal-clearance-chemeleon-v1) where you've downloaded the model.

If you're using a gpu, run:

docker run -it --user=root --rm  \
    -v ./microsomal-clearance-chemeleon-v1:/home/mambauser/model:rw \
    --runtime=nvidia 
    --gpus 
    all ghcr.io/openadmet/openadmet-models:main 

Otherwise, for cpu only:

docker run -it --user=root --rm  \
    -v ./microsomal-clearance-chemeleon-v1:/home/mambauser/model:rw \
    all ghcr.io/openadmet/openadmet-models:main 

Using the model

NOTE This model predicts log₁₀(CLᵢₙₜ) values. To get real CLᵢₙₜ values, simply backtransform:

CLint=10y^ CL_{int} = 10^{\hat{y}}

Where ŷ is our model prediction.

We will use this model for inference, aka predict the log₁₀(CLᵢₙₜ) values of a set of molecular compounds unseen to the model. For demonstration purposes, we have provided a small subset of compounds from a ZINC deck in the file compounds_for_inference.csv.

The generic command to run our inference pipeline is:

openadmet predict \
    --input-path <the path to the data to predict on> \
    --input-col <the column to of the data to predict on, often SMILES> \
    --model-dir <the anvil_training directory of the model to predict with> \
    --output-csv <the path to an output CSV to save the predictions to> \
    --accelerator <whether to use gpu or cpu, defaults to gpu>

You can run this directly in your command line, OR you can use the bash script we've provided, run_model_inference.sh.

For our working example, this command becomes:

openadmet predict \
    --input-path compounds_for_inference.csv \
    --input-col OPENADMET_CANONICAL_SMILES \
    --model-dir anvil_training/ \
    --output-csv predictions.csv \
    --accelerator cpu

You can easily substitute your own set of compounds, simply modify the --input-path and --input-col arguments for your specific dataset.

In our example, this outputs a file called predictions.csv which will have predicted (the OADMET_PRED columns) CLint values for all the species the model was trained on (human, rat, and mouse):

OADMET_PRED_chemprop_LOG_CLint_{species},
OADMET_STD_chemprop_LOG_CLint_{species}

NOTE In this example, the standard deviation (OADMET_STD) columns are empty because uncertainty cannot be estimated unless running inference on an ensemble of models. For further details, visit our demo specifically about ensembling.

Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support