File size: 7,019 Bytes
fd8650c c1ebc87 013a60a c1ebc87 fd8650c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 |
---
license: cc-by-sa-4.0
language:
- en
pipeline_tag: text-generation
---
# **PLAPT: Protein-Ligand Binding Affinity Prediction Using Pretrained Transformers**
[](https://paperswithcode.com/sota/protein-ligand-affinity-prediction-on-csar?p=plapt-protein-ligand-binding-affinity)
[](https://paperswithcode.com/sota/protein-ligand-affinity-prediction-on-pdbbind?p=plapt-protein-ligand-binding-affinity)
This is the official code repository for PLAPT, a state-of-the-art protein-ligand binding affinity predictor. [Preprint](https://doi.org/10.1101/2024.02.08.575577)
### Abstract
Understanding protein-ligand binding affinity is crucial for drug discovery, enabling the identification of promising drug candidates efficiently. We introduce PLAPT, a novel model leveraging transfer learning from pre-trained transformers like ProtBERT and ChemBERTa to predict binding affinities with high accuracy. Our method processes one-dimensional protein and ligand sequences, leveraging a branching neural network architecture for feature integration and affinity estimation. We demonstrate PLAPT's superior performance through validation on multiple datasets, achieving state-of-the-art results while requiring significantly less computational resources for training compared to existing models. Our findings indicate that PLAPT offers a highly effective and accessible approach for accelerating drug discovery efforts.

---
# Usage
---
## Plapt CLI
Plapt CLI is a command-line interface for the Plapt Python package, designed for predicting affinities using sequences and SMILES strings. This tool is user-friendly and offers flexibility in output formats and file handling.
### Prerequisites
Before using Plapt CLI, you need to have the following installed:
- Python (Download and install from [python.org](https://www.python.org/))
- Git (Download and install from [git-scm.com](https://git-scm.com/)) - Alternatively, you can download the repository as a ZIP file.
### Installation
To install Plapt CLI, you can clone the repository from GitHub:
```bash
git clone https://github.com/trrt-good/WELP-PLAPT.git
cd WELP-PLAPT
```
If you prefer not to use Git, download the ZIP file of the repository and extract it to a desired location.
Once you have the repository on your local machine, install the required dependencies:
```bash
pip install -r requirements.txt
```
(Optional) If you are using a virtual environment, activate it before installing the dependencies:
```bash
source /path/to/your/venv/bin/activate
```
### Running the Script
```bash
python plapt_cli.py -s SEQ1 SEQ2 ... -m SMILES1 SMILES2 ... -o OUTPUT_FILE -f FORMAT
```
- `-s`: Followed by one or more sequences.
- `-m`: Followed by one or more SMILES strings.
- `-o`: (Optional) Path to the output file. If omitted, results are printed to the console.
- `-f`: (Optional) Format of the output file (`json` or `csv`). Required if `-o` is used without specifying a file extension.
#### Examples
- To print results to the console:
```bash
python plapt_cli.py -s SEQ1 SEQ2 -m SMILES1 SMILES2
```
- To save results to a JSON file:
```bash
python plapt_cli.py -s SEQ1 SEQ2 -m SMILES1 SMILES2 -o results.json
```
- To save results to a CSV file:
```bash
python plapt_cli.py -s SEQ1 SEQ2 -m SMILES1 SMILES2 -o results.csv
```
- To specify the format explicitly:
```bash
python plapt_cli.py -s SEQ1 SEQ2 -m SMILES1 SMILES2 -o results -f json
```
- If `-o` is omitted, results are printed to the console.
---
## Using Plapt Directly in Python
Apart from the command-line interface, Plapt can also be used directly in Python scripts. This allows for more flexibility and integration into larger Python projects or workflows.
### Installation
Ensure you have followed the installation steps mentioned in the earlier section to set up the Plapt environment and dependencies.
### Basic Usage
To use Plapt in a Python script, you need to import the `Plapt` class and then create an instance of it. You can then call its methods to predict affinities.
#### Importing and Initializing Plapt
``` python
# First, import the Plapt class from the package, making sure you are working in the same directory as the plapt.py file:
from plapt import Plapt
# create an instance of the Plapt class. For basic usage, no initialization parameters are needed:
plapt = Plapt()
```
#### Running Predictions
After initializing the `Plapt` object, you can use it to predict affinities. Here's an example of how to do it:
```python
sequences = ["APTAPSIDMYGSNNL", "PIFLNVLEAIEPGVVC"]
smiles = ["NC(=O)[C@H](CCC(=O)O)", "NC(=[NH2+])c1ccccc1"]
results = plapt.predict_affinity(sequences, smiles)
print(results)
```
output:
```
[{'neg_log10_affinity_M': 4.38891527161495, 'affinity_uM': 40.839905489541835}, {'neg_log10_affinity_M': 4.196127195169673, 'affinity_uM': 63.66090450080189}]
```
The outputted json can subsequently used for other tasks.
### Advanced Usage
Plapt can be initialized with specialized parameters, such as the prediction module used, caching, or the inference device. Example below:
``` python
from plapt import Plapt
# create an instance of the Plapt class with other parameters:
plapt = Plapt(
prediction_module_path="models/predictionModule.onnx", # For using a different prediction module. This is set to "models/predictionModule.onnx" by default.
caching=True, # Enable or disable caching. Enabled by default.
device="cuda" # Set the computation device ("cuda" for GPU or "cpu" for CPU). If cuda isn't available on your system, it will fallback to "cpu" automatically.
)
```
Each option can be specified seperately (e.g., `plapt = Plapt(caching=False)` if you would like to disable caching.
---
#### Data Preparation and Encoding
We source protein-ligand pairs and their corresponding affinity values from an open-source binding affinity dataset on hugginface, [binding_affinity](https://huggingface.co/datasets/jglaser/binding_affinity). We then used ProtBERT and ChemBERTa for encoding proteins and ligands respectively, giving us high quality vector-space representations. The encoding process is detailed in the `encoding.ipynb` notebook. The dataset, already encoded, is available on our [Google Drive](https://drive.google.com/drive/folders/1e-ujgHx5bW0JKxSZY5u34As77o4-IIFs?usp=sharing) for ease of access and use.
#### Importing Encoders and Running the Notebook
For users to import the encoders and run the Wolfram notebook (`WL Notebooks/FinalEssay.nb`), we provide the `encoders_to_onnx.ipynb` notebook. This ensures that users can replicate our encoding process and utilize the full capabilities of PLAPT. |