MillerBind v9 & v12 β TDC Validation
Independent third-party validation of MillerBind scoring functions using the Therapeutics Data Commons (TDC) evaluation framework.
Developed by William Miller β BindStream Technologies
Results Summary
CASF-2016 Scoring Power Benchmark (n = 285, held out)
All metrics computed using tdc.Evaluator from PyTDC v1.1.15.
| Model | PCC | PCC 95% CI | Spearman Ο | MAE (pKd) | MAE 95% CI | RMSE | RΒ² |
|---|---|---|---|---|---|---|---|
| MillerBind v9 | 0.890 | [0.862, 0.912] | 0.877 | 0.780 | [0.708, 0.857] | 1.030 | 0.775 |
| MillerBind v12 | 0.938 | [0.921, 0.950] | 0.960 | 0.637 | [0.571, 0.707] | 0.869 | 0.840 |
95% confidence intervals from 1,000 bootstrap resamples.
CASF-2016 Ranking Power (53 target clusters)
Ranking power measures whether the model correctly ranks ligands by affinity within each target protein cluster.
| Model | Avg Spearman Ο | Avg Kendall Ο | Concordance | Top-1 Success |
|---|---|---|---|---|
| X-Score | 0.247 | β | β | β |
| AutoDock Vina | 0.281 | β | β | β |
| RF-Score v3 | 0.464 | β | β | β |
| ΞVinaRF20 | 0.476 | β | β | β |
| OnionNet-2 | 0.488 | β | β | β |
| MillerBind v9 | 0.740 | 0.662 | 82.7% | 60.4% |
| MillerBind v12 | 0.979 | 0.962 | 97.9% | 92.5% |
v12 achieves near-perfect ranking across 53 protein targets β correctly identifying the strongest binder in 49/53 targets.
Comparison with Published Methods (Scoring Power)
| Method | PCC | MAE (pKd) | Type | Year |
|---|---|---|---|---|
| AutoDock Vina | 0.604 | 2.05 | Physics-based | 2010 |
| RF-Score v3 | 0.800 | 1.40 | Random Forest | 2015 |
| OnionNet-2 | 0.816 | 1.28 | Deep Learning | 2021 |
| PIGNet | 0.830 | 1.21 | GNN | 2022 |
| IGN | 0.850 | 1.15 | GNN | 2021 |
| HAC-Net | 0.860 | 1.10 | DL Ensemble | 2023 |
| MillerBind v9 | 0.890 | 0.780 | Proprietary ML | 2025 |
| MillerBind v12 | 0.938 | 0.637 | Proprietary ML | 2025 |
TDC BindingDB Cross-Reference
| Metric | Value |
|---|---|
| TDC BindingDB_Kd targets with PDBbind structures | 509 / 1,090 (46.7%) |
| PDBbind complexes matching TDC targets | 8,384 |
| TDC dataset structural coverage | 49.5% (25,869 / 52,274) |
| v9 PCC on TDC-overlapping CASF-2016 subset (n=170) | 0.880 |
Full Validation Report
The complete peer-review validation report with scatter plots, bootstrap confidence intervals, residual distributions, per-affinity-range analysis, and statistical significance tests is included in this repository:
View the Full Report (HTML) β download and open in any browser, or print to PDF.
Verify Results
Option 1: Run TDC Evaluator on predictions (quick)
pip install PyTDC numpy pandas scipy
python verify_with_tdc.py
This loads the pre-computed predictions CSV and evaluates them using TDC's official Evaluator.
Option 2: Docker β full independent validation (comprehensive)
docker run --rm bindstream/millerbind-v9-validation
The Docker image contains:
- AES-256 encrypted model weights (not readable)
- AES-256 encrypted CASF-2016 features (not readable)
- Compiled Python bytecode (no source code)
- Runs predictions and reports metrics β fully offline, no network needed
Repository Contents
βββ README.md β This file
βββ predictions/
β βββ casf2016_v9_predictions.csv β 285 predictions (PDB ID, experimental, predicted pKd)
β βββ casf2016_v12_predictions.csv β 285 predictions for v12
βββ verify_with_tdc.py β TDC Evaluator verification script
βββ report/
β βββ MillerBind_TDC_Validation_Report.html β Full peer-review report with figures
βββ Dockerfile β Docker build reference (for transparency)
βββ LICENSE
Why 3D Structures?
MillerBind is a structure-based scoring function β it requires 3D protein-ligand complex structures (PDB + ligand file) as input, not SMILES strings or amino acid sequences.
This is fundamentally different from sequence-based models (e.g., DeepDTA, MolTrans) that predict binding from 1D representations. Structure-based scoring uses the actual 3D atomic coordinates of both the protein and ligand, capturing:
- Precise interatomic distances between protein and ligand atoms
- Binding pocket geometry and shape complementarity
- Hydrogen bonds, hydrophobic contacts, and electrostatic interactions in 3D space
This is why structure-based methods consistently outperform sequence-based methods on binding affinity benchmarks β they're scoring the real physical interaction, not inferring it from strings.
CASF-2016 is the gold-standard benchmark specifically designed for evaluating structure-based scoring functions (Su et al., 2019), and is the standard reported by AutoDock Vina, Glide, RF-Score, OnionNet, PIGNet, IGN, HAC-Net, and now MillerBind.
Model Details
| MillerBind v9 | MillerBind v12 | |
|---|---|---|
| Input | 3D protein-ligand complex (PDB + ligand file) | 3D protein-ligand complex (PDB + ligand file) |
| Output | Predicted pKd | Predicted binding affinity |
| Use case | General-purpose scoring | PPI, hard targets, cancer, large proteins |
| Training data | PDBbind v2020 (18,438 complexes) | PDBbind v2020 (18,438 complexes) |
| Test set | CASF-2016 core set (285, strictly held out) | CASF-2016 core set (285, strictly held out) |
| Inference | < 1 second, CPU-only | < 1 second, CPU-only |
| Architecture | Proprietary | Proprietary |
Statistical Significance
- v9 PCC: p < 10β»βΉβΈ
- v12 PCC: p < 10β»ΒΉΒ³ΒΉ
- v12 vs v9 improvement: paired t-test, t = 5.30, p = 2.4 Γ 10β»β·
References
- Huang, K., et al. (2021). Therapeutics Data Commons. NeurIPS Datasets and Benchmarks.
- Su, M., et al. (2019). Comparative Assessment of Scoring Functions: The CASF-2016 Update. J. Chem. Inf. Model., 59(2), 895β913.
- Wang, R., et al. (2004). The PDBbind Database. J. Med. Chem., 47(12), 2977β2980.
License
Results and predictions are provided for independent verification of benchmark performance.
Model weights, feature engineering, and training code are proprietary.
Β© 2026 BindStream Technologies. All rights reserved.