File size: 6,688 Bytes
5d5c9cd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0d3fb18
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5d5c9cd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
# MillerBind v9 & v12 β€” TDC Validation

**Independent third-party validation of MillerBind scoring functions using the [Therapeutics Data Commons (TDC)](https://tdcommons.ai/) evaluation framework.**

Developed by **William Miller β€” [BindStream Technologies](https://bindstreamai.com)**

---

## Results Summary

### CASF-2016 Scoring Power Benchmark (n = 285, held out)

All metrics computed using `tdc.Evaluator` from PyTDC v1.1.15.

| Model | PCC | PCC 95% CI | Spearman ρ | MAE (pKd) | MAE 95% CI | RMSE | R² |
|-------|-----|------------|------------|-----------|------------|------|----|
| **MillerBind v9** | **0.890** | [0.862, 0.912] | 0.877 | **0.780** | [0.708, 0.857] | 1.030 | 0.775 |
| **MillerBind v12** | **0.938** | [0.921, 0.950] | 0.960 | **0.637** | [0.571, 0.707] | 0.869 | 0.840 |

95% confidence intervals from 1,000 bootstrap resamples.

### CASF-2016 Ranking Power (53 target clusters)

Ranking power measures whether the model correctly ranks ligands by affinity within each target protein cluster.

| Model | Avg Spearman ρ | Avg Kendall Ο„ | Concordance | Top-1 Success |
|-------|---------------|---------------|-------------|---------------|
| X-Score | 0.247 | β€” | β€” | β€” |
| AutoDock Vina | 0.281 | β€” | β€” | β€” |
| RF-Score v3 | 0.464 | β€” | β€” | β€” |
| Ξ”VinaRF20 | 0.476 | β€” | β€” | β€” |
| OnionNet-2 | 0.488 | β€” | β€” | β€” |
| **MillerBind v9** | **0.740** | **0.662** | **82.7%** | **60.4%** |
| **MillerBind v12** | **0.979** | **0.962** | **97.9%** | **92.5%** |

v12 achieves near-perfect ranking across 53 protein targets β€” correctly identifying the strongest binder in 49/53 targets.

### Comparison with Published Methods (Scoring Power)

| Method | PCC | MAE (pKd) | Type | Year |
|--------|-----|-----------|------|------|
| AutoDock Vina | 0.604 | 2.05 | Physics-based | 2010 |
| RF-Score v3 | 0.800 | 1.40 | Random Forest | 2015 |
| OnionNet-2 | 0.816 | 1.28 | Deep Learning | 2021 |
| PIGNet | 0.830 | 1.21 | GNN | 2022 |
| IGN | 0.850 | 1.15 | GNN | 2021 |
| HAC-Net | 0.860 | 1.10 | DL Ensemble | 2023 |
| **MillerBind v9** | **0.890** | **0.780** | **Proprietary ML** | **2025** |
| **MillerBind v12** | **0.938** | **0.637** | **Proprietary ML** | **2025** |

### TDC BindingDB Cross-Reference

| Metric | Value |
|--------|-------|
| TDC BindingDB_Kd targets with PDBbind structures | 509 / 1,090 (46.7%) |
| PDBbind complexes matching TDC targets | 8,384 |
| TDC dataset structural coverage | 49.5% (25,869 / 52,274) |
| v9 PCC on TDC-overlapping CASF-2016 subset (n=170) | 0.880 |

---

## Full Validation Report

The complete peer-review validation report with scatter plots, bootstrap confidence intervals, residual distributions, per-affinity-range analysis, and statistical significance tests is included in this repository:

**[View the Full Report (HTML)](report/MillerBind_TDC_Validation_Report.html)** β€” download and open in any browser, or print to PDF.

---

## Verify Results

### Option 1: Run TDC Evaluator on predictions (quick)

```bash
pip install PyTDC numpy pandas scipy
python verify_with_tdc.py
```

This loads the pre-computed predictions CSV and evaluates them using TDC's official `Evaluator`.

### Option 2: Docker β€” full independent validation (comprehensive)

```bash
docker run --rm bindstream/millerbind-v9-validation
```

The Docker image contains:
- AES-256 encrypted model weights (not readable)
- AES-256 encrypted CASF-2016 features (not readable)
- Compiled Python bytecode (no source code)
- Runs predictions and reports metrics β€” fully offline, no network needed

---

## Repository Contents

```
β”œβ”€β”€ README.md                          ← This file
β”œβ”€β”€ predictions/
β”‚   β”œβ”€β”€ casf2016_v9_predictions.csv    ← 285 predictions (PDB ID, experimental, predicted pKd)
β”‚   └── casf2016_v12_predictions.csv   ← 285 predictions for v12
β”œβ”€β”€ verify_with_tdc.py                 ← TDC Evaluator verification script
β”œβ”€β”€ report/
β”‚   └── MillerBind_TDC_Validation_Report.html  ← Full peer-review report with figures
β”œβ”€β”€ Dockerfile                         ← Docker build reference (for transparency)
└── LICENSE
```

---

## Why 3D Structures?

MillerBind is a **structure-based** scoring function β€” it requires 3D protein-ligand complex structures (PDB + ligand file) as input, not SMILES strings or amino acid sequences.

This is fundamentally different from sequence-based models (e.g., DeepDTA, MolTrans) that predict binding from 1D representations. Structure-based scoring uses the actual 3D atomic coordinates of both the protein and ligand, capturing:

- **Precise interatomic distances** between protein and ligand atoms
- **Binding pocket geometry** and shape complementarity
- **Hydrogen bonds, hydrophobic contacts, and electrostatic interactions** in 3D space

This is why structure-based methods consistently outperform sequence-based methods on binding affinity benchmarks β€” they're scoring the real physical interaction, not inferring it from strings.

**CASF-2016** is the gold-standard benchmark specifically designed for evaluating structure-based scoring functions (Su et al., 2019), and is the standard reported by AutoDock Vina, Glide, RF-Score, OnionNet, PIGNet, IGN, HAC-Net, and now MillerBind.

---

## Model Details

| | MillerBind v9 | MillerBind v12 |
|---|---|---|
| **Input** | 3D protein-ligand complex (PDB + ligand file) | 3D protein-ligand complex (PDB + ligand file) |
| **Output** | Predicted pKd | Predicted binding affinity |
| **Use case** | General-purpose scoring | PPI, hard targets, cancer, large proteins |
| **Training data** | PDBbind v2020 (18,438 complexes) | PDBbind v2020 (18,438 complexes) |
| **Test set** | CASF-2016 core set (285, strictly held out) | CASF-2016 core set (285, strictly held out) |
| **Inference** | < 1 second, CPU-only | < 1 second, CPU-only |
| **Architecture** | Proprietary | Proprietary |

---

## Statistical Significance

- **v9 PCC**: p < 10⁻⁹⁸
- **v12 PCC**: p < 10⁻¹³¹
- **v12 vs v9 improvement**: paired t-test, t = 5.30, p = 2.4 Γ— 10⁻⁷

---

## References

1. Huang, K., et al. (2021). Therapeutics Data Commons. *NeurIPS Datasets and Benchmarks*.
2. Su, M., et al. (2019). Comparative Assessment of Scoring Functions: The CASF-2016 Update. *J. Chem. Inf. Model.*, 59(2), 895–913.
3. Wang, R., et al. (2004). The PDBbind Database. *J. Med. Chem.*, 47(12), 2977–2980.

---

## License

Results and predictions are provided for independent verification of benchmark performance.

Model weights, feature engineering, and training code are proprietary.

Β© 2026 BindStream Technologies. All rights reserved.