In this repo, we have our baseline model. It is a single task CheMeleon model trained on pEC50 data curated from ChEMBL for PXR.
It is a no split model, meaning it has been trained with no data allocated to validation and test sets and with just a training set of 1.0.
Getting Started
Pre-requisites
We highly recommend you have the Anvil framework from openadmet-models installed in an environment (called openadmet-models) for ease of use and full utilization of OpenADMET's models. The installation instructions can be found here.
Alternatively, you can also use Docker to spin up a containerized pre-installed environment to run openadmet-models. Just be sure you are mounting the correct folder (./pxr-chemeleon-baseline) where you've downloaded the model.
If you're using a gpu, run:
docker run -it --user=root --rm \
-v ./pxr-chemeleon-baseline:/home/mambauser/model:rw \
--runtime=nvidia
--gpus
all ghcr.io/openadmet/openadmet-models:main
Otherwise, for cpu only:
docker run -it --user=root --rm \
-v ./pxr-chemeleon-baseline:/home/mambauser/model:rw \
all ghcr.io/openadmet/openadmet-models:main
You will also need git lfs installed.
Downloading the model
- After installing Anvil, clone the model repo:
git clone https://huggingface.co/openadmet/pxr-chemeleon-baseline/
- Change to the repo directory. Ensure you have
git lfsinstalled for the repo and get the large model files:
git lfs install
git lfs pull
- You are now ready to use the model!
Using the model
We will use this model for inference or, to predict the pIC50s of a set of molecular compounds unseen to the model. For demonstration purposes, we have provided a small subset of compounds from a ZINC deck in the file compounds_for_inference.csv.
The generic command to run our inference pipeline is:
openadmet predict \
--input-path <the path to the data to predict on> \
--input-col <the column to of the data to predict on, often SMILES> \
--model-dir <the anvil_training directory of the model to predict with> \
--output-csv <the path to an output CSV to save the predictions to> \
--accelerator <whether to use gpu or cpu, defaults to gpu>
You can run this directly in your command line, OR you can use the bash script we've provided, run_model_inference.sh.
For our working example, this command becomes:
openadmet predict \
--input-path compounds_for_inference.csv \
--input-col OPENADMET_CANONICAL_SMILES \
--model-dir anvil_training/ \
--output-csv predictions.csv \
--accelerator cpu
You can easily substitute your own set of compounds, simply modify the --input-path and --input-col arguments for your specific dataset.
In our example, this outputs a file called predictions.csv which will have predicted (the OADMET_PRED columns) pEC50 values for the PXR target:
OADMET_PRED_chemprop-chembl_pchembl_value_mean,
OADMET_STD_chemprop-chembl_pchembl_value_mean
NOTE In this example, the standard deviation (OADMET_STD) columns are empty because uncertainty cannot be estimated unless training an ensemble of models. For further details, visit our docs.