Upload folder using huggingface_hub

0d3fb18 verified 7 days ago

6.69 kB

	# MillerBind v9 & v12 — TDC Validation

	Independent third-party validation of MillerBind scoring functions using the [Therapeutics Data Commons (TDC)](https://tdcommons.ai/) evaluation framework.

	Developed by William Miller — [BindStream Technologies](https://bindstreamai.com)

	---

	## Results Summary

	### CASF-2016 Scoring Power Benchmark (n = 285, held out)

	All metrics computed using `tdc.Evaluator` from PyTDC v1.1.15.

	\| Model \| PCC \| PCC 95% CI \| Spearman ρ \| MAE (pKd) \| MAE 95% CI \| RMSE \| R² \|
	\|-------\|-----\|------------\|------------\|-----------\|------------\|------\|----\|
	\| MillerBind v9 \| 0.890 \| [0.862, 0.912] \| 0.877 \| 0.780 \| [0.708, 0.857] \| 1.030 \| 0.775 \|
	\| MillerBind v12 \| 0.938 \| [0.921, 0.950] \| 0.960 \| 0.637 \| [0.571, 0.707] \| 0.869 \| 0.840 \|

	95% confidence intervals from 1,000 bootstrap resamples.

	### CASF-2016 Ranking Power (53 target clusters)

	Ranking power measures whether the model correctly ranks ligands by affinity within each target protein cluster.

	\| Model \| Avg Spearman ρ \| Avg Kendall τ \| Concordance \| Top-1 Success \|
	\|-------\|---------------\|---------------\|-------------\|---------------\|
	\| X-Score \| 0.247 \| — \| — \| — \|
	\| AutoDock Vina \| 0.281 \| — \| — \| — \|
	\| RF-Score v3 \| 0.464 \| — \| — \| — \|
	\| ΔVinaRF20 \| 0.476 \| — \| — \| — \|
	\| OnionNet-2 \| 0.488 \| — \| — \| — \|
	\| MillerBind v9 \| 0.740 \| 0.662 \| 82.7% \| 60.4% \|
	\| MillerBind v12 \| 0.979 \| 0.962 \| 97.9% \| 92.5% \|

	v12 achieves near-perfect ranking across 53 protein targets — correctly identifying the strongest binder in 49/53 targets.

	### Comparison with Published Methods (Scoring Power)

	\| Method \| PCC \| MAE (pKd) \| Type \| Year \|
	\|--------\|-----\|-----------\|------\|------\|
	\| AutoDock Vina \| 0.604 \| 2.05 \| Physics-based \| 2010 \|
	\| RF-Score v3 \| 0.800 \| 1.40 \| Random Forest \| 2015 \|
	\| OnionNet-2 \| 0.816 \| 1.28 \| Deep Learning \| 2021 \|
	\| PIGNet \| 0.830 \| 1.21 \| GNN \| 2022 \|
	\| IGN \| 0.850 \| 1.15 \| GNN \| 2021 \|
	\| HAC-Net \| 0.860 \| 1.10 \| DL Ensemble \| 2023 \|
	\| MillerBind v9 \| 0.890 \| 0.780 \| Proprietary ML \| 2025 \|
	\| MillerBind v12 \| 0.938 \| 0.637 \| Proprietary ML \| 2025 \|

	### TDC BindingDB Cross-Reference

	\| Metric \| Value \|
	\|--------\|-------\|
	\| TDC BindingDB_Kd targets with PDBbind structures \| 509 / 1,090 (46.7%) \|
	\| PDBbind complexes matching TDC targets \| 8,384 \|
	\| TDC dataset structural coverage \| 49.5% (25,869 / 52,274) \|
	\| v9 PCC on TDC-overlapping CASF-2016 subset (n=170) \| 0.880 \|

	---

	## Full Validation Report

	The complete peer-review validation report with scatter plots, bootstrap confidence intervals, residual distributions, per-affinity-range analysis, and statistical significance tests is included in this repository:

	[View the Full Report (HTML)](report/MillerBind_TDC_Validation_Report.html) — download and open in any browser, or print to PDF.

	---

	## Verify Results

	### Option 1: Run TDC Evaluator on predictions (quick)

	```bash
	pip install PyTDC numpy pandas scipy
	python verify_with_tdc.py
	```

	This loads the pre-computed predictions CSV and evaluates them using TDC's official `Evaluator`.

	### Option 2: Docker — full independent validation (comprehensive)

	```bash
	docker run --rm bindstream/millerbind-v9-validation
	```

	The Docker image contains:
	- AES-256 encrypted model weights (not readable)
	- AES-256 encrypted CASF-2016 features (not readable)
	- Compiled Python bytecode (no source code)
	- Runs predictions and reports metrics — fully offline, no network needed

	---

	## Repository Contents

	```
	├── README.md ← This file
	├── predictions/
	│ ├── casf2016_v9_predictions.csv ← 285 predictions (PDB ID, experimental, predicted pKd)
	│ └── casf2016_v12_predictions.csv ← 285 predictions for v12
	├── verify_with_tdc.py ← TDC Evaluator verification script
	├── report/
	│ └── MillerBind_TDC_Validation_Report.html ← Full peer-review report with figures
	├── Dockerfile ← Docker build reference (for transparency)
	└── LICENSE
	```

	---

	## Why 3D Structures?

	MillerBind is a structure-based scoring function — it requires 3D protein-ligand complex structures (PDB + ligand file) as input, not SMILES strings or amino acid sequences.

	This is fundamentally different from sequence-based models (e.g., DeepDTA, MolTrans) that predict binding from 1D representations. Structure-based scoring uses the actual 3D atomic coordinates of both the protein and ligand, capturing:

	- Precise interatomic distances between protein and ligand atoms
	- Binding pocket geometry and shape complementarity
	- Hydrogen bonds, hydrophobic contacts, and electrostatic interactions in 3D space

	This is why structure-based methods consistently outperform sequence-based methods on binding affinity benchmarks — they're scoring the real physical interaction, not inferring it from strings.

	CASF-2016 is the gold-standard benchmark specifically designed for evaluating structure-based scoring functions (Su et al., 2019), and is the standard reported by AutoDock Vina, Glide, RF-Score, OnionNet, PIGNet, IGN, HAC-Net, and now MillerBind.

	---

	## Model Details

	\| \| MillerBind v9 \| MillerBind v12 \|
	\|---\|---\|---\|
	\| Input \| 3D protein-ligand complex (PDB + ligand file) \| 3D protein-ligand complex (PDB + ligand file) \|
	\| Output \| Predicted pKd \| Predicted binding affinity \|
	\| Use case \| General-purpose scoring \| PPI, hard targets, cancer, large proteins \|
	\| Training data \| PDBbind v2020 (18,438 complexes) \| PDBbind v2020 (18,438 complexes) \|
	\| Test set \| CASF-2016 core set (285, strictly held out) \| CASF-2016 core set (285, strictly held out) \|
	\| Inference \| < 1 second, CPU-only \| < 1 second, CPU-only \|
	\| Architecture \| Proprietary \| Proprietary \|

	---

	## Statistical Significance

	- v9 PCC: p < 10⁻⁹⁸
	- v12 PCC: p < 10⁻¹³¹
	- v12 vs v9 improvement: paired t-test, t = 5.30, p = 2.4 × 10⁻⁷

	---

	## References

	1. Huang, K., et al. (2021). Therapeutics Data Commons. NeurIPS Datasets and Benchmarks.
	2. Su, M., et al. (2019). Comparative Assessment of Scoring Functions: The CASF-2016 Update. J. Chem. Inf. Model., 59(2), 895–913.
	3. Wang, R., et al. (2004). The PDBbind Database. J. Med. Chem., 47(12), 2977–2980.

	---

	## License

	Results and predictions are provided for independent verification of benchmark performance.

	Model weights, feature engineering, and training code are proprietary.

	© 2026 BindStream Technologies. All rights reserved.