File size: 7,665 Bytes

# Protein Design with Diffusers

A [diffusers](https://github.com/huggingface/diffusers) `ModularPipeline` wrapper for protein design, combining structure generation ([RFdiffusion3](https://www.biorxiv.org/content/10.1101/2025.09.18.676967v2)) and sequence design ([ProteinMPNN](https://www.science.org/doi/10.1126/science.add2187) / [LigandMPNN](https://www.nature.com/articles/s41592-025-02626-1)) into composable, swappable pipeline blocks.

All three models — RFD3, ProteinMPNN, and LigandMPNN — rely on [Foundry](https://github.com/RosettaCommons/foundry) for their underlying implementations and [AtomWorks](https://github.com/RosettaCommons/atomworks) for structure I/O. This package adds only the thin wrappers needed for diffusers integration.

## Getting Started

### Installation

```bash
pip install rc-foundry[all]
pip install diffusers
```

### Running with Diffusers

```python
import torch
from diffusers import ModularPipeline

pipe = ModularPipeline.from_pretrained("dn6/RFDiffusion-3", trust_remote_code=True)
pipe.load_components(device_map="cuda", torch_dtype=torch.bfloat16, trust_remote_code=True)

state = pipe(contigs="100")
print(state.output.xyz.shape)  # [1, 100, 3]
```

## Workflows

The active workflow is selected automatically based on which inputs you provide:

| Workflow | Trigger inputs | What runs |
|----------|---------------|-----------|
| `structure_only` | `contigs` | RFdiffusion3 |
| `structure_and_sequence` | `contigs`, `temperature` | RFdiffusion3 → MPNN |
| `motif_structure_and_sequence` | `contigs`, `input_xyz`, `temperature` | Motif-conditioned RFdiffusion3 → MPNN |

### Structure Only

```python
state = pipe(contigs="100")
print(state.output.xyz.shape)  # [1, 100, 3]
```

### Structure + Sequence Design

Passing `temperature` triggers the MPNN sequence design step. Load an MPNN variant first:

```python
from diffusers import AutoModel

mpnn = AutoModel.from_pretrained("dn6/RFDiffusion-3", subfolder="mpnn", trust_remote_code=True)
pipe.update_components(mpnn=mpnn)

state = pipe(contigs="100", temperature=0.1)
print(state.mpnn_output.designed_sequence)  # e.g. "MKVLSEG..."
```

Three MPNN variants are available:

| Subfolder | Variant | Params | Description |
|-----------|---------|--------|-------------|
| `mpnn/` | ProteinMPNN | 1.66M | Standard protein sequence design |
| `mpnn_ligand/` | LigandMPNN | 2.62M | Ligand-aware sequence design |
| `mpnn_soluble/` | SolubleMPNN | 1.66M | Optimized for soluble proteins |

### Motif-Conditioned Design

Passing `input_xyz` enables motif conditioning — fix specific residues in place while designing the rest:

```python
import torch

motif_coords = torch.randn(16, 3)  # [N_motif, 3]
state = pipe(
    contigs="A10-25/50",
    input_xyz=motif_coords,
    temperature=0.1,
)
```

### Full Design Pipeline

The three pipelines can be composed into a complete protein design workflow:

```
RFD3 (design backbone) → MPNN (design sequence) → RF3 (validate fold)
```

```python
import torch
from diffusers import AutoModel, ModularPipeline

# 1. Design a backbone + sequence
design_pipe = ModularPipeline.from_pretrained("dn6/RFDiffusion-3", trust_remote_code=True)
design_pipe.load_components(device_map="cuda", torch_dtype=torch.bfloat16, trust_remote_code=True)

mpnn = AutoModel.from_pretrained("dn6/RFDiffusion-3", subfolder="mpnn", trust_remote_code=True)
design_pipe.update_components(mpnn=mpnn)

state = design_pipe(contigs="100", temperature=0.1, output_type="cif.gz", output_path="design")
designed_sequence = state.mpnn_output.designed_sequence

# 2. Validate the fold with RF3
fold_pipe = ModularPipeline.from_pretrained("dn6/RosettaFold-3", trust_remote_code=True)
fold_pipe.load_components(device_map="cuda", torch_dtype=torch.bfloat16, trust_remote_code=True)

state = fold_pipe(sequence=designed_sequence, output_type="cif.gz", output_path="prediction")
```

> [RF3](https://www.biorxiv.org/content/10.1101/2025.08.14.670328) (RosettaFold3) is available as a separate pipeline at [`dn6/RosettaFold-3`](https://huggingface.co/dn6/RosettaFold-3).

## Customizing Workflows

Inspect, swap, and extend pipeline blocks at runtime:

```python
# Inspect the pipeline structure
print(pipe.blocks)

# Swap ProteinMPNN for LigandMPNN
mpnn = AutoModel.from_pretrained("dn6/RFDiffusion-3", subfolder="mpnn_ligand", trust_remote_code=True)
pipe.update_components(mpnn=mpnn)

# Select a workflow explicitly
workflow = pipe.get_workflow("structure_and_sequence")

# Add a custom block
from diffusers.modular_pipelines import ModularPipelineBlocks, PipelineState
from diffusers.modular_pipelines.modular_pipeline_utils import InputParam, OutputParam

class ScoreDesignStep(ModularPipelineBlocks):
    @property
    def inputs(self):
        return [InputParam("xyz", required=True)]

    @property
    def intermediate_outputs(self):
        return [OutputParam("radius_of_gyration")]

    def __call__(self, components, state):
        block_state = self.get_block_state(state)
        xyz = block_state.xyz
        centroid = xyz.mean(dim=-2, keepdim=True)
        block_state.radius_of_gyration = ((xyz - centroid) ** 2).sum(-1).mean().sqrt()
        self.set_block_state(state, block_state)
        return components, state

pipe._blocks.sub_blocks.insert("score", ScoreDesignStep(), index=3)
```

## Output Types

| `output_type` | Additional output | Writes to disk |
|---|---|---|
| `"tensor"` | — | — |
| `"pdb"` | `pdb_string` | `.pdb` file |
| `"cif"` | `atom_array`, `atom_array_stack`, `trajectory_stack` | `.cif` via AtomWorks |
| `"cif.gz"` | Same as `"cif"` | `.cif.gz` compressed |

CIF outputs use [AtomWorks](https://github.com/RosettaCommons/atomworks) `to_cif_file` and return [biotite](https://www.biotite-python.org/) `AtomArray` / `AtomArrayStack` objects, matching the foundry output format.

```python
# Save as compressed CIF
state = pipe(contigs="100", output_type="cif.gz", output_path="design_0")

# Access AtomArray directly
atom_array = state.output.atom_array

# Denoising trajectory
trajectory = state.output.trajectory_stack

# PDB output
state = pipe(contigs="100", output_type="pdb", output_path="design_0.pdb")
print(state.output.pdb_string[:200])
```

## Citation

```bibtex
@article{butcher2025_rfdiffusion3,
    author = {Butcher, Jasper and Krishna, Rohith and Mitra, Raktim and Brent, Rafael Isaac and Li, Yanjing and Corley, Nathaniel and Kim, Paul T and Funk, Jonathan and Mathis, Simon Valentin and Salike, Saman and Muraishi, Aiko and Eisenach, Helen and Thompson, Tuscan Rock and Chen, Jie and Politanska, Yuliya and Sehgal, Enisha and Coventry, Brian and Zhang, Odin and Qiang, Bo and Didi, Kieran and Kazman, Maxwell and DiMaio, Frank and Baker, David},
    title = {De novo Design of All-atom Biomolecular Interactions with RFdiffusion3},
    journal = {bioRxiv},
    year = {2025},
    doi = {10.1101/2025.09.18.676967},
}

@article{dauparas2022robust,
    author = {Dauparas, Justas and Anishchenko, Ivan and Bennett, Nathaniel and Bai, Hua and Ragotte, Robert J and Milles, Lukas F and Wicky, Basile IM and Courbet, Alexis and de Haas, Rob J and Bethel, Neville and others},
    title = {Robust deep learning--based protein sequence design using ProteinMPNN},
    journal = {Science},
    volume = {378},
    number = {6615},
    pages = {49--56},
    year = {2022},
}

@article{dauparas2025atomic,
    author = {Dauparas, Justas and Lee, Gyu Rie and Pecoraro, Robert and An, Linna and Anishchenko, Ivan and Glasscock, Cameron and Baker, David},
    title = {Atomic context-conditioned protein sequence design using LigandMPNN},
    journal = {Nature Methods},
    year = {2025},
}
```