File size: 4,521 Bytes
ad9572d
3642948
ad9572d
3642948
 
 
 
 
 
 
ad9572d
 
 
 
 
083c0b2
 
ad9572d
 
 
 
 
 
 
 
 
 
 
 
3642948
ad9572d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f033200
 
 
 
2327665
f033200
 
ad9572d
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
---
license: apache-2.0
tags:
- protein-design
- allosteric
- state-selectivity
- guided-generation
- rfdiffusion
- pxdesign
- proteina
library_name: pytorch
---

# AlloGen


![allogen](https://cdn-uploads.huggingface.co/production/uploads/64cd5b3f0494187a9e8b7c69/et5-pzgiGiAH0uVqvs8tM.png)

State-selectivity scoring + guided generation for allosteric binder design.

🧪 **One-click demo for biology users:**
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/#fileId=https%3A//huggingface.co/ChatterjeeLab/AlloGen/raw/main/notebooks/AlloGen_CaM_demo.ipynb) — score CaM binders and run Q_θ-guided PXDesign sampling in 5 minutes. Notebook lives at [`notebooks/AlloGen_CaM_demo.ipynb`](notebooks/AlloGen_CaM_demo.ipynb).

AlloGen trains a scorer Q_θ(X, Y) ∈ (0,1) that ranks how well a binder Y discriminates a target's **holo** (active) state X¹ from its **apo** (inactive) state X⁰. The selectivity score is:

    S(Y) = Q_θ(X¹, Y) − Q_θ(X⁰, Y)

Q_θ serves as both a re-ranker (best-of-K) and a gradient signal for guided generation on top of frozen priors (RFdiffusion, PXDesign, Proteina-ComplexA) via Langevin, SMC, TDS, or classifier guidance.

This repository accompanies the paper *AlloGen: AlloGen: Conformation-Selective Binder Generation with Differential State Scoring* (arXiv 2026).

## Installation

```bash
conda env create -f environment.yml
conda activate allogen
```

Or pip-only:

```bash
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
```

Python 3.10 + PyTorch 2.x are required. A CUDA GPU is recommended for guidance, but CPU works for scoring single designs.

## Inference quickstart

```bash
# Score the bundled CaM inference sample against the v4-S2 (target-swap) checkpoint
python code/scripts/evaluate.py \
    --target cam \
    --checkpoint checkpoints/Q_theta_phase2.pt \
    --data_dir data/sample/ \
    --outdir /tmp/cam_inference \
    --no_wandb
```

See [`inference.md`](inference.md) for the scoring API + guidance command lines.

## Repo layout

```
code/
  data/           dataset / graph construction, PDB I/O, target YAMLs
  models/         Q_θ scorer (graph transformer) + differentiable wrapper
  trainers/       two-phase training loop (DockQ regression + selectivity)
  utils/          PDB I/O, backbone frames, SAM optimizer
  scripts/        evaluate, rescore, PXDesign guidance (see scripts/README.md)
checkpoints/      Q_θ paper weights (v4-S2 target-swap split, via Git LFS)
data/sample/      tiny CaM inference sample (test split only)
```

## Checkpoints

Paper weights for the **v4-S2 target-swap** split are bundled via **Git LFS**:

```bash
git lfs install
git lfs pull
```

| File | Use |
|---|---|
| `checkpoints/Q_theta_phase1.pt` | Phase 1 (DockQ regression) intermediate checkpoint |
| `checkpoints/Q_theta_phase2.pt` | Phase 2 (selectivity) — main paper result |
| `checkpoints/Q_theta_train_curve.csv` | Training curve metadata |

## Scoring a single design

```python
import sys; sys.path.insert(0, 'code')
from models.differentiable_features import DifferentiableQTheta

scorer = DifferentiableQTheta(
    checkpoint='checkpoints/Q_theta_phase2.pt',
    device='cuda:0',
)
scorer.load_receptor(
    holo_path='your_holo.pdb', rec_chain='A',
    apo_path='your_apo.pdb',   apo_chain='A',
)
q_holo = scorer.score('design.pdb', binder_chain='B', state='holo')
q_apo  = scorer.score('design.pdb', binder_chain='B', state='apo')
print(f'S = {q_holo - q_apo:.3f}')
```

## Guidance methods

The shipped guidance code wraps **PXDesign** as the prior and uses Q_θ as the gradient / classifier signal. All four method variants (Langevin, SMC, TDS, classifier guidance) live in `code/scripts/pxdesign_guidance/`.

See [`inference.md`](inference.md) §3 for command lines.

To deploy Q_θ with **RFdiffusion**, **Proteina-ComplexA**, or any other backbone prior, see [`code/scripts/README.md`](code/scripts/README.md) — Q_θ exposes `DifferentiableQTheta` for `∇_x S(x)`, and the PXDesign code is a worked template to mirror.

## Citation

```bibtex
@article{cao2026allogen,
  title         = {AlloGen: Conformation-Selective Binder Generation with Differential State Scoring},
  author        = {Cao, Hanqun and Quinn, Zachary and Pal, Aastha and Kimura, Sumi and Zhang, Jingjie and Heng, Pheng Ann and Chatterjee, Pranam},
  year          = {2026},
  eprint        = {2606.05474},
  archivePrefix = {arXiv},
  primaryClass  = {q-bio.BM}
}
```