English
chemistry
biology

This is a baseline model trained on publicly available data. While we've done our best to curate the data, the model performance is quite poor. Proceed with caution.

In this repo, we have our baseline model. It is a single task CheMeleon model trained on pEC50 data curated from ChEMBL for PXR.

It is a no split model, meaning it has been trained with no data allocated to validation and test sets and with just a training set of 1.0.

Getting Started

Pre-requisites

IMPORTANT NOTE You will need git lfs installed.

Downloading the model

  1. Clone the model repo:
git clone https://huggingface.co/openadmet/pxr-chemeleon-baseline/
  1. Change to the repo directory. Ensure you have git lfs installed for the repo and get the large model files:
git lfs install
git lfs pull

Option A: Running the model locally

We highly recommend you have the Anvil framework from openadmet-models installed in an environment (called openadmet-models) for ease of use and full utilization of OpenADMET's models. The installation instructions can be found here and also below:

  1. You can install openadmet-models via our GitHub package. If you want the latest development version, clone the repository and install in editable mode:
git clone git@github.com:OpenADMET/openadmet-models.git
  1. Set up an environment using the provided files in devtools/conda-envs.
cd openadmet-models/
conda env create -f devtools/conda-envs/openadmet-models.yaml
conda activate openadmet-models
pip install -e .
  1. If you want to use GPU acceleration, ensure you have the appropriate CUDA toolkit installed and use the openadmet-models-cuda.yaml file instead:
conda env create -f devtools/conda-envs/openadmet-models-gpu.yaml
conda activate openadmet-models
pip install -e .

Option B: Running the model with Docker

  1. Alternatively, you can also use Docker to spin up a containerized pre-installed environment to run openadmet-models. Just be sure you are mounting the correct folder (./pxr-chemeleon-baseline) where you've downloaded the model.
  2. For CPU only, run:
docker run -it --user=root --rm  \
    -v ./pxr-chemeleon-baseline:/home/mambauser/model:rw \
    all ghcr.io/openadmet/openadmet-models:main 
  1. For GPU, run:
docker run -it --user=root --rm  \
    -v ./pxr-chemeleon-baseline:/home/mambauser/model:rw \
    --runtime=nvidia 
    --gpus 
    all ghcr.io/openadmet/openadmet-models:main 

Using the model

We will use this model for inference, aka predict the pEC50s of a set of molecular compounds unseen to the model. For demonstration purposes, we have provided a small subset of compounds from a ZINC deck in the file compounds_for_inference.csv.

The generic command to run our inference pipeline is:

openadmet predict \
    --input-path <the path to the data to predict on> \
    --input-col <the column to of the data to predict on, often SMILES> \
    --model-dir <the anvil_training directory of the model to predict with> \
    --output-csv <the path to an output CSV to save the predictions to> \
    --accelerator <whether to use gpu or cpu, defaults to gpu>

You can run this directly in your command line, OR you can use the bash script we've provided, run_model_inference.sh.

For our working example, this command becomes:

openadmet predict \
    --input-path compounds_for_inference.csv \
    --input-col OPENADMET_CANONICAL_SMILES \
    --model-dir anvil_training/ \
    --output-csv predictions.csv \
    --accelerator cpu

You can easily substitute your own set of compounds, simply modify the --input-path and --input-col arguments for your specific dataset. If you want to use a GPU (reccomended) substitute accelerator gpu in the above.

In our example, this outputs a file called predictions.csv which will have predicted (the OADMET_PRED columns) pEC50 values for the PXR target:

OADMET_PRED_chemprop-chembl_pchembl_value_mean,
OADMET_STD_chemprop-chembl_pchembl_value_mean

NOTE In this example, the standard deviation (OADMET_STD) columns are empty because uncertainty cannot be estimated unless training an ensemble of models. For further details, visit our docs.

Downloads last month
25
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support