willdabeatz's picture
Upload folder using huggingface_hub
0d3fb18 verified
# MillerBind v9 & v12 β€” TDC Validation
**Independent third-party validation of MillerBind scoring functions using the [Therapeutics Data Commons (TDC)](https://tdcommons.ai/) evaluation framework.**
Developed by **William Miller β€” [BindStream Technologies](https://bindstreamai.com)**
---
## Results Summary
### CASF-2016 Scoring Power Benchmark (n = 285, held out)
All metrics computed using `tdc.Evaluator` from PyTDC v1.1.15.
| Model | PCC | PCC 95% CI | Spearman ρ | MAE (pKd) | MAE 95% CI | RMSE | R² |
|-------|-----|------------|------------|-----------|------------|------|----|
| **MillerBind v9** | **0.890** | [0.862, 0.912] | 0.877 | **0.780** | [0.708, 0.857] | 1.030 | 0.775 |
| **MillerBind v12** | **0.938** | [0.921, 0.950] | 0.960 | **0.637** | [0.571, 0.707] | 0.869 | 0.840 |
95% confidence intervals from 1,000 bootstrap resamples.
### CASF-2016 Ranking Power (53 target clusters)
Ranking power measures whether the model correctly ranks ligands by affinity within each target protein cluster.
| Model | Avg Spearman ρ | Avg Kendall Ο„ | Concordance | Top-1 Success |
|-------|---------------|---------------|-------------|---------------|
| X-Score | 0.247 | β€” | β€” | β€” |
| AutoDock Vina | 0.281 | β€” | β€” | β€” |
| RF-Score v3 | 0.464 | β€” | β€” | β€” |
| Ξ”VinaRF20 | 0.476 | β€” | β€” | β€” |
| OnionNet-2 | 0.488 | β€” | β€” | β€” |
| **MillerBind v9** | **0.740** | **0.662** | **82.7%** | **60.4%** |
| **MillerBind v12** | **0.979** | **0.962** | **97.9%** | **92.5%** |
v12 achieves near-perfect ranking across 53 protein targets β€” correctly identifying the strongest binder in 49/53 targets.
### Comparison with Published Methods (Scoring Power)
| Method | PCC | MAE (pKd) | Type | Year |
|--------|-----|-----------|------|------|
| AutoDock Vina | 0.604 | 2.05 | Physics-based | 2010 |
| RF-Score v3 | 0.800 | 1.40 | Random Forest | 2015 |
| OnionNet-2 | 0.816 | 1.28 | Deep Learning | 2021 |
| PIGNet | 0.830 | 1.21 | GNN | 2022 |
| IGN | 0.850 | 1.15 | GNN | 2021 |
| HAC-Net | 0.860 | 1.10 | DL Ensemble | 2023 |
| **MillerBind v9** | **0.890** | **0.780** | **Proprietary ML** | **2025** |
| **MillerBind v12** | **0.938** | **0.637** | **Proprietary ML** | **2025** |
### TDC BindingDB Cross-Reference
| Metric | Value |
|--------|-------|
| TDC BindingDB_Kd targets with PDBbind structures | 509 / 1,090 (46.7%) |
| PDBbind complexes matching TDC targets | 8,384 |
| TDC dataset structural coverage | 49.5% (25,869 / 52,274) |
| v9 PCC on TDC-overlapping CASF-2016 subset (n=170) | 0.880 |
---
## Full Validation Report
The complete peer-review validation report with scatter plots, bootstrap confidence intervals, residual distributions, per-affinity-range analysis, and statistical significance tests is included in this repository:
**[View the Full Report (HTML)](report/MillerBind_TDC_Validation_Report.html)** β€” download and open in any browser, or print to PDF.
---
## Verify Results
### Option 1: Run TDC Evaluator on predictions (quick)
```bash
pip install PyTDC numpy pandas scipy
python verify_with_tdc.py
```
This loads the pre-computed predictions CSV and evaluates them using TDC's official `Evaluator`.
### Option 2: Docker β€” full independent validation (comprehensive)
```bash
docker run --rm bindstream/millerbind-v9-validation
```
The Docker image contains:
- AES-256 encrypted model weights (not readable)
- AES-256 encrypted CASF-2016 features (not readable)
- Compiled Python bytecode (no source code)
- Runs predictions and reports metrics β€” fully offline, no network needed
---
## Repository Contents
```
β”œβ”€β”€ README.md ← This file
β”œβ”€β”€ predictions/
β”‚ β”œβ”€β”€ casf2016_v9_predictions.csv ← 285 predictions (PDB ID, experimental, predicted pKd)
β”‚ └── casf2016_v12_predictions.csv ← 285 predictions for v12
β”œβ”€β”€ verify_with_tdc.py ← TDC Evaluator verification script
β”œβ”€β”€ report/
β”‚ └── MillerBind_TDC_Validation_Report.html ← Full peer-review report with figures
β”œβ”€β”€ Dockerfile ← Docker build reference (for transparency)
└── LICENSE
```
---
## Why 3D Structures?
MillerBind is a **structure-based** scoring function β€” it requires 3D protein-ligand complex structures (PDB + ligand file) as input, not SMILES strings or amino acid sequences.
This is fundamentally different from sequence-based models (e.g., DeepDTA, MolTrans) that predict binding from 1D representations. Structure-based scoring uses the actual 3D atomic coordinates of both the protein and ligand, capturing:
- **Precise interatomic distances** between protein and ligand atoms
- **Binding pocket geometry** and shape complementarity
- **Hydrogen bonds, hydrophobic contacts, and electrostatic interactions** in 3D space
This is why structure-based methods consistently outperform sequence-based methods on binding affinity benchmarks β€” they're scoring the real physical interaction, not inferring it from strings.
**CASF-2016** is the gold-standard benchmark specifically designed for evaluating structure-based scoring functions (Su et al., 2019), and is the standard reported by AutoDock Vina, Glide, RF-Score, OnionNet, PIGNet, IGN, HAC-Net, and now MillerBind.
---
## Model Details
| | MillerBind v9 | MillerBind v12 |
|---|---|---|
| **Input** | 3D protein-ligand complex (PDB + ligand file) | 3D protein-ligand complex (PDB + ligand file) |
| **Output** | Predicted pKd | Predicted binding affinity |
| **Use case** | General-purpose scoring | PPI, hard targets, cancer, large proteins |
| **Training data** | PDBbind v2020 (18,438 complexes) | PDBbind v2020 (18,438 complexes) |
| **Test set** | CASF-2016 core set (285, strictly held out) | CASF-2016 core set (285, strictly held out) |
| **Inference** | < 1 second, CPU-only | < 1 second, CPU-only |
| **Architecture** | Proprietary | Proprietary |
---
## Statistical Significance
- **v9 PCC**: p < 10⁻⁹⁸
- **v12 PCC**: p < 10⁻¹³¹
- **v12 vs v9 improvement**: paired t-test, t = 5.30, p = 2.4 Γ— 10⁻⁷
---
## References
1. Huang, K., et al. (2021). Therapeutics Data Commons. *NeurIPS Datasets and Benchmarks*.
2. Su, M., et al. (2019). Comparative Assessment of Scoring Functions: The CASF-2016 Update. *J. Chem. Inf. Model.*, 59(2), 895–913.
3. Wang, R., et al. (2004). The PDBbind Database. *J. Med. Chem.*, 47(12), 2977–2980.
---
## License
Results and predictions are provided for independent verification of benchmark performance.
Model weights, feature engineering, and training code are proprietary.
Β© 2026 BindStream Technologies. All rights reserved.