File size: 6,688 Bytes

# MillerBind v9 & v12 — TDC Validation

**Independent third-party validation of MillerBind scoring functions using the [Therapeutics Data Commons (TDC)](https://tdcommons.ai/) evaluation framework.**

Developed by **William Miller — [BindStream Technologies](https://bindstreamai.com)**

---

## Results Summary

### CASF-2016 Scoring Power Benchmark (n = 285, held out)

All metrics computed using `tdc.Evaluator` from PyTDC v1.1.15.

| Model | PCC | PCC 95% CI | Spearman ρ | MAE (pKd) | MAE 95% CI | RMSE | R² |
|-------|-----|------------|------------|-----------|------------|------|----|
| **MillerBind v9** | **0.890** | [0.862, 0.912] | 0.877 | **0.780** | [0.708, 0.857] | 1.030 | 0.775 |
| **MillerBind v12** | **0.938** | [0.921, 0.950] | 0.960 | **0.637** | [0.571, 0.707] | 0.869 | 0.840 |

95% confidence intervals from 1,000 bootstrap resamples.

### CASF-2016 Ranking Power (53 target clusters)

Ranking power measures whether the model correctly ranks ligands by affinity within each target protein cluster.

| Model | Avg Spearman ρ | Avg Kendall τ | Concordance | Top-1 Success |
|-------|---------------|---------------|-------------|---------------|
| X-Score | 0.247 | — | — | — |
| AutoDock Vina | 0.281 | — | — | — |
| RF-Score v3 | 0.464 | — | — | — |
| ΔVinaRF20 | 0.476 | — | — | — |
| OnionNet-2 | 0.488 | — | — | — |
| **MillerBind v9** | **0.740** | **0.662** | **82.7%** | **60.4%** |
| **MillerBind v12** | **0.979** | **0.962** | **97.9%** | **92.5%** |

v12 achieves near-perfect ranking across 53 protein targets — correctly identifying the strongest binder in 49/53 targets.

### Comparison with Published Methods (Scoring Power)

| Method | PCC | MAE (pKd) | Type | Year |
|--------|-----|-----------|------|------|
| AutoDock Vina | 0.604 | 2.05 | Physics-based | 2010 |
| RF-Score v3 | 0.800 | 1.40 | Random Forest | 2015 |
| OnionNet-2 | 0.816 | 1.28 | Deep Learning | 2021 |
| PIGNet | 0.830 | 1.21 | GNN | 2022 |
| IGN | 0.850 | 1.15 | GNN | 2021 |
| HAC-Net | 0.860 | 1.10 | DL Ensemble | 2023 |
| **MillerBind v9** | **0.890** | **0.780** | **Proprietary ML** | **2025** |
| **MillerBind v12** | **0.938** | **0.637** | **Proprietary ML** | **2025** |

### TDC BindingDB Cross-Reference

| Metric | Value |
|--------|-------|
| TDC BindingDB_Kd targets with PDBbind structures | 509 / 1,090 (46.7%) |
| PDBbind complexes matching TDC targets | 8,384 |
| TDC dataset structural coverage | 49.5% (25,869 / 52,274) |
| v9 PCC on TDC-overlapping CASF-2016 subset (n=170) | 0.880 |

---

## Full Validation Report

The complete peer-review validation report with scatter plots, bootstrap confidence intervals, residual distributions, per-affinity-range analysis, and statistical significance tests is included in this repository:

**[View the Full Report (HTML)](report/MillerBind_TDC_Validation_Report.html)** — download and open in any browser, or print to PDF.

---

## Verify Results

### Option 1: Run TDC Evaluator on predictions (quick)

```bash
pip install PyTDC numpy pandas scipy
python verify_with_tdc.py
```

This loads the pre-computed predictions CSV and evaluates them using TDC's official `Evaluator`.

### Option 2: Docker — full independent validation (comprehensive)

```bash
docker run --rm bindstream/millerbind-v9-validation
```

The Docker image contains:
- AES-256 encrypted model weights (not readable)
- AES-256 encrypted CASF-2016 features (not readable)
- Compiled Python bytecode (no source code)
- Runs predictions and reports metrics — fully offline, no network needed

---

## Repository Contents

```
├── README.md                          ← This file
├── predictions/
│   ├── casf2016_v9_predictions.csv    ← 285 predictions (PDB ID, experimental, predicted pKd)
│   └── casf2016_v12_predictions.csv   ← 285 predictions for v12
├── verify_with_tdc.py                 ← TDC Evaluator verification script
├── report/
│   └── MillerBind_TDC_Validation_Report.html  ← Full peer-review report with figures
├── Dockerfile                         ← Docker build reference (for transparency)
└── LICENSE
```

---

## Why 3D Structures?

MillerBind is a **structure-based** scoring function — it requires 3D protein-ligand complex structures (PDB + ligand file) as input, not SMILES strings or amino acid sequences.

This is fundamentally different from sequence-based models (e.g., DeepDTA, MolTrans) that predict binding from 1D representations. Structure-based scoring uses the actual 3D atomic coordinates of both the protein and ligand, capturing:

- **Precise interatomic distances** between protein and ligand atoms
- **Binding pocket geometry** and shape complementarity
- **Hydrogen bonds, hydrophobic contacts, and electrostatic interactions** in 3D space

This is why structure-based methods consistently outperform sequence-based methods on binding affinity benchmarks — they're scoring the real physical interaction, not inferring it from strings.

**CASF-2016** is the gold-standard benchmark specifically designed for evaluating structure-based scoring functions (Su et al., 2019), and is the standard reported by AutoDock Vina, Glide, RF-Score, OnionNet, PIGNet, IGN, HAC-Net, and now MillerBind.

---

## Model Details

| | MillerBind v9 | MillerBind v12 |
|---|---|---|
| **Input** | 3D protein-ligand complex (PDB + ligand file) | 3D protein-ligand complex (PDB + ligand file) |
| **Output** | Predicted pKd | Predicted binding affinity |
| **Use case** | General-purpose scoring | PPI, hard targets, cancer, large proteins |
| **Training data** | PDBbind v2020 (18,438 complexes) | PDBbind v2020 (18,438 complexes) |
| **Test set** | CASF-2016 core set (285, strictly held out) | CASF-2016 core set (285, strictly held out) |
| **Inference** | < 1 second, CPU-only | < 1 second, CPU-only |
| **Architecture** | Proprietary | Proprietary |

---

## Statistical Significance

- **v9 PCC**: p < 10⁻⁹⁸
- **v12 PCC**: p < 10⁻¹³¹
- **v12 vs v9 improvement**: paired t-test, t = 5.30, p = 2.4 × 10⁻⁷

---

## References

1. Huang, K., et al. (2021). Therapeutics Data Commons. *NeurIPS Datasets and Benchmarks*.
2. Su, M., et al. (2019). Comparative Assessment of Scoring Functions: The CASF-2016 Update. *J. Chem. Inf. Model.*, 59(2), 895–913.
3. Wang, R., et al. (2004). The PDBbind Database. *J. Med. Chem.*, 47(12), 2977–2980.

---

## License

Results and predictions are provided for independent verification of benchmark performance.

Model weights, feature engineering, and training code are proprietary.

© 2026 BindStream Technologies. All rights reserved.