File size: 7,665 Bytes
4900749
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23854f9
 
4900749
 
23854f9
4900749
 
23854f9
4900749
23854f9
4900749
23854f9
 
 
 
 
4900749
23854f9
4900749
 
 
23854f9
4900749
 
23854f9
4900749
23854f9
4900749
 
23854f9
 
 
 
 
 
 
4900749
 
23854f9
4900749
23854f9
 
 
 
 
 
 
 
 
4900749
 
23854f9
4900749
23854f9
 
 
 
 
 
 
 
 
 
 
4900749
 
23854f9
 
 
 
 
 
4900749
23854f9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4900749
 
 
 
23854f9
4900749
 
 
 
 
 
23854f9
 
 
 
4900749
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23854f9
4900749
 
23854f9
4900749
 
23854f9
4900749
 
23854f9
4900749
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
# Protein Design with Diffusers

A [diffusers](https://github.com/huggingface/diffusers) `ModularPipeline` wrapper for protein design, combining structure generation ([RFdiffusion3](https://www.biorxiv.org/content/10.1101/2025.09.18.676967v2)) and sequence design ([ProteinMPNN](https://www.science.org/doi/10.1126/science.add2187) / [LigandMPNN](https://www.nature.com/articles/s41592-025-02626-1)) into composable, swappable pipeline blocks.

All three models β€” RFD3, ProteinMPNN, and LigandMPNN β€” rely on [Foundry](https://github.com/RosettaCommons/foundry) for their underlying implementations and [AtomWorks](https://github.com/RosettaCommons/atomworks) for structure I/O. This package adds only the thin wrappers needed for diffusers integration.

## Getting Started

### Installation

```bash
pip install rc-foundry[all]
pip install diffusers
```

### Running with Diffusers

```python
import torch
from diffusers import ModularPipeline

pipe = ModularPipeline.from_pretrained("dn6/RFDiffusion-3", trust_remote_code=True)
pipe.load_components(device_map="cuda", torch_dtype=torch.bfloat16, trust_remote_code=True)

state = pipe(contigs="100")
print(state.output.xyz.shape)  # [1, 100, 3]
```

## Workflows

The active workflow is selected automatically based on which inputs you provide:

| Workflow | Trigger inputs | What runs |
|----------|---------------|-----------|
| `structure_only` | `contigs` | RFdiffusion3 |
| `structure_and_sequence` | `contigs`, `temperature` | RFdiffusion3 β†’ MPNN |
| `motif_structure_and_sequence` | `contigs`, `input_xyz`, `temperature` | Motif-conditioned RFdiffusion3 β†’ MPNN |

### Structure Only

```python
state = pipe(contigs="100")
print(state.output.xyz.shape)  # [1, 100, 3]
```

### Structure + Sequence Design

Passing `temperature` triggers the MPNN sequence design step. Load an MPNN variant first:

```python
from diffusers import AutoModel

mpnn = AutoModel.from_pretrained("dn6/RFDiffusion-3", subfolder="mpnn", trust_remote_code=True)
pipe.update_components(mpnn=mpnn)

state = pipe(contigs="100", temperature=0.1)
print(state.mpnn_output.designed_sequence)  # e.g. "MKVLSEG..."
```

Three MPNN variants are available:

| Subfolder | Variant | Params | Description |
|-----------|---------|--------|-------------|
| `mpnn/` | ProteinMPNN | 1.66M | Standard protein sequence design |
| `mpnn_ligand/` | LigandMPNN | 2.62M | Ligand-aware sequence design |
| `mpnn_soluble/` | SolubleMPNN | 1.66M | Optimized for soluble proteins |

### Motif-Conditioned Design

Passing `input_xyz` enables motif conditioning β€” fix specific residues in place while designing the rest:

```python
import torch

motif_coords = torch.randn(16, 3)  # [N_motif, 3]
state = pipe(
    contigs="A10-25/50",
    input_xyz=motif_coords,
    temperature=0.1,
)
```

### Full Design Pipeline

The three pipelines can be composed into a complete protein design workflow:

```
RFD3 (design backbone) β†’ MPNN (design sequence) β†’ RF3 (validate fold)
```

```python
import torch
from diffusers import AutoModel, ModularPipeline

# 1. Design a backbone + sequence
design_pipe = ModularPipeline.from_pretrained("dn6/RFDiffusion-3", trust_remote_code=True)
design_pipe.load_components(device_map="cuda", torch_dtype=torch.bfloat16, trust_remote_code=True)

mpnn = AutoModel.from_pretrained("dn6/RFDiffusion-3", subfolder="mpnn", trust_remote_code=True)
design_pipe.update_components(mpnn=mpnn)

state = design_pipe(contigs="100", temperature=0.1, output_type="cif.gz", output_path="design")
designed_sequence = state.mpnn_output.designed_sequence

# 2. Validate the fold with RF3
fold_pipe = ModularPipeline.from_pretrained("dn6/RosettaFold-3", trust_remote_code=True)
fold_pipe.load_components(device_map="cuda", torch_dtype=torch.bfloat16, trust_remote_code=True)

state = fold_pipe(sequence=designed_sequence, output_type="cif.gz", output_path="prediction")
```

> [RF3](https://www.biorxiv.org/content/10.1101/2025.08.14.670328) (RosettaFold3) is available as a separate pipeline at [`dn6/RosettaFold-3`](https://huggingface.co/dn6/RosettaFold-3).

## Customizing Workflows

Inspect, swap, and extend pipeline blocks at runtime:

```python
# Inspect the pipeline structure
print(pipe.blocks)

# Swap ProteinMPNN for LigandMPNN
mpnn = AutoModel.from_pretrained("dn6/RFDiffusion-3", subfolder="mpnn_ligand", trust_remote_code=True)
pipe.update_components(mpnn=mpnn)

# Select a workflow explicitly
workflow = pipe.get_workflow("structure_and_sequence")

# Add a custom block
from diffusers.modular_pipelines import ModularPipelineBlocks, PipelineState
from diffusers.modular_pipelines.modular_pipeline_utils import InputParam, OutputParam

class ScoreDesignStep(ModularPipelineBlocks):
    @property
    def inputs(self):
        return [InputParam("xyz", required=True)]

    @property
    def intermediate_outputs(self):
        return [OutputParam("radius_of_gyration")]

    def __call__(self, components, state):
        block_state = self.get_block_state(state)
        xyz = block_state.xyz
        centroid = xyz.mean(dim=-2, keepdim=True)
        block_state.radius_of_gyration = ((xyz - centroid) ** 2).sum(-1).mean().sqrt()
        self.set_block_state(state, block_state)
        return components, state

pipe._blocks.sub_blocks.insert("score", ScoreDesignStep(), index=3)
```

## Output Types

| `output_type` | Additional output | Writes to disk |
|---|---|---|
| `"tensor"` | β€” | β€” |
| `"pdb"` | `pdb_string` | `.pdb` file |
| `"cif"` | `atom_array`, `atom_array_stack`, `trajectory_stack` | `.cif` via AtomWorks |
| `"cif.gz"` | Same as `"cif"` | `.cif.gz` compressed |

CIF outputs use [AtomWorks](https://github.com/RosettaCommons/atomworks) `to_cif_file` and return [biotite](https://www.biotite-python.org/) `AtomArray` / `AtomArrayStack` objects, matching the foundry output format.

```python
# Save as compressed CIF
state = pipe(contigs="100", output_type="cif.gz", output_path="design_0")

# Access AtomArray directly
atom_array = state.output.atom_array

# Denoising trajectory
trajectory = state.output.trajectory_stack

# PDB output
state = pipe(contigs="100", output_type="pdb", output_path="design_0.pdb")
print(state.output.pdb_string[:200])
```

## Citation

```bibtex
@article{butcher2025_rfdiffusion3,
    author = {Butcher, Jasper and Krishna, Rohith and Mitra, Raktim and Brent, Rafael Isaac and Li, Yanjing and Corley, Nathaniel and Kim, Paul T and Funk, Jonathan and Mathis, Simon Valentin and Salike, Saman and Muraishi, Aiko and Eisenach, Helen and Thompson, Tuscan Rock and Chen, Jie and Politanska, Yuliya and Sehgal, Enisha and Coventry, Brian and Zhang, Odin and Qiang, Bo and Didi, Kieran and Kazman, Maxwell and DiMaio, Frank and Baker, David},
    title = {De novo Design of All-atom Biomolecular Interactions with RFdiffusion3},
    journal = {bioRxiv},
    year = {2025},
    doi = {10.1101/2025.09.18.676967},
}

@article{dauparas2022robust,
    author = {Dauparas, Justas and Anishchenko, Ivan and Bennett, Nathaniel and Bai, Hua and Ragotte, Robert J and Milles, Lukas F and Wicky, Basile IM and Courbet, Alexis and de Haas, Rob J and Bethel, Neville and others},
    title = {Robust deep learning--based protein sequence design using ProteinMPNN},
    journal = {Science},
    volume = {378},
    number = {6615},
    pages = {49--56},
    year = {2022},
}

@article{dauparas2025atomic,
    author = {Dauparas, Justas and Lee, Gyu Rie and Pecoraro, Robert and An, Linna and Anishchenko, Ivan and Glasscock, Cameron and Baker, David},
    title = {Atomic context-conditioned protein sequence design using LigandMPNN},
    journal = {Nature Methods},
    year = {2025},
}
```