| # MillerBind v9 & v12 β TDC Validation | |
| **Independent third-party validation of MillerBind scoring functions using the [Therapeutics Data Commons (TDC)](https://tdcommons.ai/) evaluation framework.** | |
| Developed by **William Miller β [BindStream Technologies](https://bindstreamai.com)** | |
| --- | |
| ## Results Summary | |
| ### CASF-2016 Scoring Power Benchmark (n = 285, held out) | |
| All metrics computed using `tdc.Evaluator` from PyTDC v1.1.15. | |
| | Model | PCC | PCC 95% CI | Spearman Ο | MAE (pKd) | MAE 95% CI | RMSE | RΒ² | | |
| |-------|-----|------------|------------|-----------|------------|------|----| | |
| | **MillerBind v9** | **0.890** | [0.862, 0.912] | 0.877 | **0.780** | [0.708, 0.857] | 1.030 | 0.775 | | |
| | **MillerBind v12** | **0.938** | [0.921, 0.950] | 0.960 | **0.637** | [0.571, 0.707] | 0.869 | 0.840 | | |
| 95% confidence intervals from 1,000 bootstrap resamples. | |
| ### CASF-2016 Ranking Power (53 target clusters) | |
| Ranking power measures whether the model correctly ranks ligands by affinity within each target protein cluster. | |
| | Model | Avg Spearman Ο | Avg Kendall Ο | Concordance | Top-1 Success | | |
| |-------|---------------|---------------|-------------|---------------| | |
| | X-Score | 0.247 | β | β | β | | |
| | AutoDock Vina | 0.281 | β | β | β | | |
| | RF-Score v3 | 0.464 | β | β | β | | |
| | ΞVinaRF20 | 0.476 | β | β | β | | |
| | OnionNet-2 | 0.488 | β | β | β | | |
| | **MillerBind v9** | **0.740** | **0.662** | **82.7%** | **60.4%** | | |
| | **MillerBind v12** | **0.979** | **0.962** | **97.9%** | **92.5%** | | |
| v12 achieves near-perfect ranking across 53 protein targets β correctly identifying the strongest binder in 49/53 targets. | |
| ### Comparison with Published Methods (Scoring Power) | |
| | Method | PCC | MAE (pKd) | Type | Year | | |
| |--------|-----|-----------|------|------| | |
| | AutoDock Vina | 0.604 | 2.05 | Physics-based | 2010 | | |
| | RF-Score v3 | 0.800 | 1.40 | Random Forest | 2015 | | |
| | OnionNet-2 | 0.816 | 1.28 | Deep Learning | 2021 | | |
| | PIGNet | 0.830 | 1.21 | GNN | 2022 | | |
| | IGN | 0.850 | 1.15 | GNN | 2021 | | |
| | HAC-Net | 0.860 | 1.10 | DL Ensemble | 2023 | | |
| | **MillerBind v9** | **0.890** | **0.780** | **Proprietary ML** | **2025** | | |
| | **MillerBind v12** | **0.938** | **0.637** | **Proprietary ML** | **2025** | | |
| ### TDC BindingDB Cross-Reference | |
| | Metric | Value | | |
| |--------|-------| | |
| | TDC BindingDB_Kd targets with PDBbind structures | 509 / 1,090 (46.7%) | | |
| | PDBbind complexes matching TDC targets | 8,384 | | |
| | TDC dataset structural coverage | 49.5% (25,869 / 52,274) | | |
| | v9 PCC on TDC-overlapping CASF-2016 subset (n=170) | 0.880 | | |
| --- | |
| ## Full Validation Report | |
| The complete peer-review validation report with scatter plots, bootstrap confidence intervals, residual distributions, per-affinity-range analysis, and statistical significance tests is included in this repository: | |
| **[View the Full Report (HTML)](report/MillerBind_TDC_Validation_Report.html)** β download and open in any browser, or print to PDF. | |
| --- | |
| ## Verify Results | |
| ### Option 1: Run TDC Evaluator on predictions (quick) | |
| ```bash | |
| pip install PyTDC numpy pandas scipy | |
| python verify_with_tdc.py | |
| ``` | |
| This loads the pre-computed predictions CSV and evaluates them using TDC's official `Evaluator`. | |
| ### Option 2: Docker β full independent validation (comprehensive) | |
| ```bash | |
| docker run --rm bindstream/millerbind-v9-validation | |
| ``` | |
| The Docker image contains: | |
| - AES-256 encrypted model weights (not readable) | |
| - AES-256 encrypted CASF-2016 features (not readable) | |
| - Compiled Python bytecode (no source code) | |
| - Runs predictions and reports metrics β fully offline, no network needed | |
| --- | |
| ## Repository Contents | |
| ``` | |
| βββ README.md β This file | |
| βββ predictions/ | |
| β βββ casf2016_v9_predictions.csv β 285 predictions (PDB ID, experimental, predicted pKd) | |
| β βββ casf2016_v12_predictions.csv β 285 predictions for v12 | |
| βββ verify_with_tdc.py β TDC Evaluator verification script | |
| βββ report/ | |
| β βββ MillerBind_TDC_Validation_Report.html β Full peer-review report with figures | |
| βββ Dockerfile β Docker build reference (for transparency) | |
| βββ LICENSE | |
| ``` | |
| --- | |
| ## Why 3D Structures? | |
| MillerBind is a **structure-based** scoring function β it requires 3D protein-ligand complex structures (PDB + ligand file) as input, not SMILES strings or amino acid sequences. | |
| This is fundamentally different from sequence-based models (e.g., DeepDTA, MolTrans) that predict binding from 1D representations. Structure-based scoring uses the actual 3D atomic coordinates of both the protein and ligand, capturing: | |
| - **Precise interatomic distances** between protein and ligand atoms | |
| - **Binding pocket geometry** and shape complementarity | |
| - **Hydrogen bonds, hydrophobic contacts, and electrostatic interactions** in 3D space | |
| This is why structure-based methods consistently outperform sequence-based methods on binding affinity benchmarks β they're scoring the real physical interaction, not inferring it from strings. | |
| **CASF-2016** is the gold-standard benchmark specifically designed for evaluating structure-based scoring functions (Su et al., 2019), and is the standard reported by AutoDock Vina, Glide, RF-Score, OnionNet, PIGNet, IGN, HAC-Net, and now MillerBind. | |
| --- | |
| ## Model Details | |
| | | MillerBind v9 | MillerBind v12 | | |
| |---|---|---| | |
| | **Input** | 3D protein-ligand complex (PDB + ligand file) | 3D protein-ligand complex (PDB + ligand file) | | |
| | **Output** | Predicted pKd | Predicted binding affinity | | |
| | **Use case** | General-purpose scoring | PPI, hard targets, cancer, large proteins | | |
| | **Training data** | PDBbind v2020 (18,438 complexes) | PDBbind v2020 (18,438 complexes) | | |
| | **Test set** | CASF-2016 core set (285, strictly held out) | CASF-2016 core set (285, strictly held out) | | |
| | **Inference** | < 1 second, CPU-only | < 1 second, CPU-only | | |
| | **Architecture** | Proprietary | Proprietary | | |
| --- | |
| ## Statistical Significance | |
| - **v9 PCC**: p < 10β»βΉβΈ | |
| - **v12 PCC**: p < 10β»ΒΉΒ³ΒΉ | |
| - **v12 vs v9 improvement**: paired t-test, t = 5.30, p = 2.4 Γ 10β»β· | |
| --- | |
| ## References | |
| 1. Huang, K., et al. (2021). Therapeutics Data Commons. *NeurIPS Datasets and Benchmarks*. | |
| 2. Su, M., et al. (2019). Comparative Assessment of Scoring Functions: The CASF-2016 Update. *J. Chem. Inf. Model.*, 59(2), 895β913. | |
| 3. Wang, R., et al. (2004). The PDBbind Database. *J. Med. Chem.*, 47(12), 2977β2980. | |
| --- | |
| ## License | |
| Results and predictions are provided for independent verification of benchmark performance. | |
| Model weights, feature engineering, and training code are proprietary. | |
| Β© 2026 BindStream Technologies. All rights reserved. | |