willdabeatz's picture
Upload folder using huggingface_hub
0d3fb18 verified

MillerBind v9 & v12 β€” TDC Validation

Independent third-party validation of MillerBind scoring functions using the Therapeutics Data Commons (TDC) evaluation framework.

Developed by William Miller β€” BindStream Technologies


Results Summary

CASF-2016 Scoring Power Benchmark (n = 285, held out)

All metrics computed using tdc.Evaluator from PyTDC v1.1.15.

Model PCC PCC 95% CI Spearman ρ MAE (pKd) MAE 95% CI RMSE R²
MillerBind v9 0.890 [0.862, 0.912] 0.877 0.780 [0.708, 0.857] 1.030 0.775
MillerBind v12 0.938 [0.921, 0.950] 0.960 0.637 [0.571, 0.707] 0.869 0.840

95% confidence intervals from 1,000 bootstrap resamples.

CASF-2016 Ranking Power (53 target clusters)

Ranking power measures whether the model correctly ranks ligands by affinity within each target protein cluster.

Model Avg Spearman ρ Avg Kendall Ο„ Concordance Top-1 Success
X-Score 0.247 β€” β€” β€”
AutoDock Vina 0.281 β€” β€” β€”
RF-Score v3 0.464 β€” β€” β€”
Ξ”VinaRF20 0.476 β€” β€” β€”
OnionNet-2 0.488 β€” β€” β€”
MillerBind v9 0.740 0.662 82.7% 60.4%
MillerBind v12 0.979 0.962 97.9% 92.5%

v12 achieves near-perfect ranking across 53 protein targets β€” correctly identifying the strongest binder in 49/53 targets.

Comparison with Published Methods (Scoring Power)

Method PCC MAE (pKd) Type Year
AutoDock Vina 0.604 2.05 Physics-based 2010
RF-Score v3 0.800 1.40 Random Forest 2015
OnionNet-2 0.816 1.28 Deep Learning 2021
PIGNet 0.830 1.21 GNN 2022
IGN 0.850 1.15 GNN 2021
HAC-Net 0.860 1.10 DL Ensemble 2023
MillerBind v9 0.890 0.780 Proprietary ML 2025
MillerBind v12 0.938 0.637 Proprietary ML 2025

TDC BindingDB Cross-Reference

Metric Value
TDC BindingDB_Kd targets with PDBbind structures 509 / 1,090 (46.7%)
PDBbind complexes matching TDC targets 8,384
TDC dataset structural coverage 49.5% (25,869 / 52,274)
v9 PCC on TDC-overlapping CASF-2016 subset (n=170) 0.880

Full Validation Report

The complete peer-review validation report with scatter plots, bootstrap confidence intervals, residual distributions, per-affinity-range analysis, and statistical significance tests is included in this repository:

View the Full Report (HTML) β€” download and open in any browser, or print to PDF.


Verify Results

Option 1: Run TDC Evaluator on predictions (quick)

pip install PyTDC numpy pandas scipy
python verify_with_tdc.py

This loads the pre-computed predictions CSV and evaluates them using TDC's official Evaluator.

Option 2: Docker β€” full independent validation (comprehensive)

docker run --rm bindstream/millerbind-v9-validation

The Docker image contains:

  • AES-256 encrypted model weights (not readable)
  • AES-256 encrypted CASF-2016 features (not readable)
  • Compiled Python bytecode (no source code)
  • Runs predictions and reports metrics β€” fully offline, no network needed

Repository Contents

β”œβ”€β”€ README.md                          ← This file
β”œβ”€β”€ predictions/
β”‚   β”œβ”€β”€ casf2016_v9_predictions.csv    ← 285 predictions (PDB ID, experimental, predicted pKd)
β”‚   └── casf2016_v12_predictions.csv   ← 285 predictions for v12
β”œβ”€β”€ verify_with_tdc.py                 ← TDC Evaluator verification script
β”œβ”€β”€ report/
β”‚   └── MillerBind_TDC_Validation_Report.html  ← Full peer-review report with figures
β”œβ”€β”€ Dockerfile                         ← Docker build reference (for transparency)
└── LICENSE

Why 3D Structures?

MillerBind is a structure-based scoring function β€” it requires 3D protein-ligand complex structures (PDB + ligand file) as input, not SMILES strings or amino acid sequences.

This is fundamentally different from sequence-based models (e.g., DeepDTA, MolTrans) that predict binding from 1D representations. Structure-based scoring uses the actual 3D atomic coordinates of both the protein and ligand, capturing:

  • Precise interatomic distances between protein and ligand atoms
  • Binding pocket geometry and shape complementarity
  • Hydrogen bonds, hydrophobic contacts, and electrostatic interactions in 3D space

This is why structure-based methods consistently outperform sequence-based methods on binding affinity benchmarks β€” they're scoring the real physical interaction, not inferring it from strings.

CASF-2016 is the gold-standard benchmark specifically designed for evaluating structure-based scoring functions (Su et al., 2019), and is the standard reported by AutoDock Vina, Glide, RF-Score, OnionNet, PIGNet, IGN, HAC-Net, and now MillerBind.


Model Details

MillerBind v9 MillerBind v12
Input 3D protein-ligand complex (PDB + ligand file) 3D protein-ligand complex (PDB + ligand file)
Output Predicted pKd Predicted binding affinity
Use case General-purpose scoring PPI, hard targets, cancer, large proteins
Training data PDBbind v2020 (18,438 complexes) PDBbind v2020 (18,438 complexes)
Test set CASF-2016 core set (285, strictly held out) CASF-2016 core set (285, strictly held out)
Inference < 1 second, CPU-only < 1 second, CPU-only
Architecture Proprietary Proprietary

Statistical Significance

  • v9 PCC: p < 10⁻⁹⁸
  • v12 PCC: p < 10⁻¹³¹
  • v12 vs v9 improvement: paired t-test, t = 5.30, p = 2.4 Γ— 10⁻⁷

References

  1. Huang, K., et al. (2021). Therapeutics Data Commons. NeurIPS Datasets and Benchmarks.
  2. Su, M., et al. (2019). Comparative Assessment of Scoring Functions: The CASF-2016 Update. J. Chem. Inf. Model., 59(2), 895–913.
  3. Wang, R., et al. (2004). The PDBbind Database. J. Med. Chem., 47(12), 2977–2980.

License

Results and predictions are provided for independent verification of benchmark performance.

Model weights, feature engineering, and training code are proprietary.

Β© 2026 BindStream Technologies. All rights reserved.