File size: 3,911 Bytes
f84bd3a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 |
---
language:
- en
license: mit
tags:
- drug-discovery
- binding-affinity
- protein-ligand
- graph-neural-network
- esm2
- drug-repurposing
- multimodal
- transfer-learning
datasets:
- pdbbind-v2020
metrics:
- rmse
- pearsonr
pipeline_tag: other
---
# DeepPharm: Multi-Modal Transfer Learning for Drug-Target Affinity Prediction
## Model Description
**DeepPharm** is a multi-modal deep learning framework for predicting protein–ligand binding affinity ($pK$). It combines:
- **GATv2** molecular graph encoder (3 layers, 4 heads)
- **ECFP4** fingerprint MLP encoder (2048→128)
- **Gated Fusion** mechanism for adaptive ligand representation
- **ESM-2** protein language model (150M params, fine-tuned)
- **Stacked Cross-Attention** (2 layers, 4 heads) for drug-protein interaction
- **Residual Prediction Head** with SiLU activation
### Two Modes of Operation
| Mode | Task | Input | Output |
|------|------|-------|--------|
| **Mode A** | Supervised affinity prediction | Drug SMILES + Protein sequence | pK value |
| **Mode B** | Weakly supervised drug repurposing | Drug + Disease signature | Ranked candidates |
## Performance
### Systematic Ablation (PDBbind v2020, $N_{test}=3{,}775$)
| Config | RMSE ↓ | Pearson ↑ | Spearman ↑ |
|--------|--------|-----------|------------|
| V1 Baseline (ESM-35M) | 1.266 | 0.743 | 0.743 |
| V2 Architecture | 1.258 | 0.748 | 0.746 |
| V2 + CosineWR | 1.244 | 0.753 | 0.750 |
| **V2 + ESM-150M (Best)** | **1.229** | **0.762** | **0.760** |
| V2 + EMA | 1.247 | 0.753 | 0.753 |
### Five-Seed Ensemble (Best Configuration)
| Metric | Mean ± Std |
|--------|-----------|
| RMSE | 1.246 ± 0.005 |
| Pearson r | 0.751 ± 0.002 |
| Spearman ρ | 0.750 ± 0.002 |
CV < 0.4% confirms high reproducibility.
### Baselines (all re-implemented on same split)
| Model | RMSE ↓ | Pearson ↑ |
|-------|--------|-----------|
| DeepDTA (CNN) | 1.48 | 0.61 |
| GraphDTA (GCN) | 1.39 | 0.67 |
| MolCLR* | 1.30 | 0.74 |
| DrugBAN | 1.28 | 0.76 |
| **DeepPharm V2** | **1.23** | **0.76** |
## Intended Use
- High-throughput virtual screening of drug candidates
- Binding affinity prediction for drug-target pairs
- Hypothesis generation for drug repurposing in orphan diseases
- Research and academic purposes
## Limitations
- 2D topological encoder; cannot distinguish stereoisomers
- Trained on PDBbind v2020, which overrepresents kinases
- Mode B uses drug priors (guilt-by-association), not zero-shot inference
- Predictions require experimental validation
## Training Details
- **Dataset:** PDBbind v2020 General Set (15,100 train / 3,775 test, seed=42)
- **Hardware:** 1× NVIDIA H100 80 GB
- **Optimizer:** AdamW (backbone LR: 5e-6, head LR: 8e-4)
- **Scheduler:** CosineAnnealing with Warm Restarts ($T_0$=10, $T_{mult}$=2)
- **Loss:** MSE + 0.3·RankingLoss + 0.2·HuberLoss
- **Training time:** ~11 min/epoch (ESM-2 150M), best checkpoint at epoch 18
## Available Checkpoints
| File | Description | RMSE |
|------|-------------|------|
| `best_v2_esm150m.pt` | Best V2 model (ESM-2 150M) | 1.229 |
| `best_v1_esm35m.pt` | V1 Baseline (ESM-2 35M) | 1.266 |
## How to Use
```python
from huggingface_hub import hf_hub_download
# Download the best model
path = hf_hub_download("chamoso/DeepPharm", "best_v2_esm150m.pt")
# Load in PyTorch
import torch
checkpoint = torch.load(path, map_location="cpu")
```
For full inference with data preprocessing:
```bash
git clone https://github.com/chamoso/DeepPharm.git
cd DeepPharm
python scripts/predict.py \
--checkpoint weights/best_v2_esm150m.pt \
--smiles "CC(=O)Oc1ccccc1C(=O)O" \
--sequence "MKTAYIAKQRQISFVKSHFSRQLE..."
```
## Links
- **GitHub:** [chamoso/DeepPharm](https://github.com/chamoso/DeepPharm)
- **Live Demo:** [HuggingFace Spaces](https://huggingface.co/spaces/chamoso/DeepPharm)
## Citation
*Preprint coming soon.*
## License
MIT License
|