File size: 4,967 Bytes

a376829

# Protein Structure Prediction with Diffusers

A [diffusers](https://github.com/huggingface/diffusers) `ModularPipeline` wrapper for [RosettaFold3](https://doi.org/10.1101/2025.08.14.670328) (RF3) — a diffusion-based protein structure prediction model that predicts 3D atomic coordinates from amino acid sequences.

RF3 relies on [Foundry](https://github.com/RosettaCommons/foundry) for its underlying implementation and [AtomWorks](https://github.com/RosettaCommons/atomworks) for structure I/O. This package adds only the thin wrappers needed for diffusers integration.

## Getting Started

### Installation

```bash
pip install rc-foundry[all]
pip install diffusers
```

### Running with Diffusers

```python
import torch
from diffusers import ModularPipeline

pipe = ModularPipeline.from_pretrained("dn6/RosettaFold-3", trust_remote_code=True)
pipe.load_components(device_map="cuda", torch_dtype=torch.bfloat16, trust_remote_code=True)

state = pipe(sequence="MKVLSEGDPWRK...")
print(state.output.xyz.shape)  # [D, L, 3]
```

## Workflows

| Workflow | Trigger inputs | What runs |
|----------|---------------|-----------|
| `fold` | `sequence` | Full structure prediction (recycling trunk + diffusion) |

### Fold a Sequence

```python
state = pipe(sequence="MKVLSEGDPWRK...", output_type="cif.gz", output_path="prediction")
print(state.output.atom_array)
```

### Full Design Pipeline

RF3 is typically used as a validation step after backbone design with [RFdiffusion3](https://huggingface.co/dn6/RFDiffusion-3):

```
RFD3 (design backbone) → MPNN (design sequence) → RF3 (validate fold)
```

```python
import torch
from diffusers import AutoModel, ModularPipeline

# 1. Design a backbone + sequence
design_pipe = ModularPipeline.from_pretrained("dn6/RFDiffusion-3", trust_remote_code=True)
design_pipe.load_components(device_map="cuda", torch_dtype=torch.bfloat16, trust_remote_code=True)

mpnn = AutoModel.from_pretrained("dn6/RFDiffusion-3", subfolder="mpnn", trust_remote_code=True)
design_pipe.update_components(mpnn=mpnn)

state = design_pipe(contigs="100", temperature=0.1)
designed_sequence = state.mpnn_output.designed_sequence

# 2. Validate the fold
fold_pipe = ModularPipeline.from_pretrained("dn6/RosettaFold-3", trust_remote_code=True)
fold_pipe.load_components(device_map="cuda", torch_dtype=torch.bfloat16, trust_remote_code=True)

state = fold_pipe(sequence=designed_sequence, output_type="cif.gz", output_path="prediction")
```

## Customizing Workflows

```python
# Inspect the pipeline structure
print(pipe.blocks)

# Add a custom block
from diffusers.modular_pipelines import ModularPipelineBlocks, PipelineState
from diffusers.modular_pipelines.modular_pipeline_utils import InputParam, OutputParam

class ComputeRadiusOfGyration(ModularPipelineBlocks):
    @property
    def inputs(self):
        return [InputParam("xyz", required=True)]

    @property
    def intermediate_outputs(self):
        return [OutputParam("radius_of_gyration")]

    def __call__(self, components, state):
        block_state = self.get_block_state(state)
        xyz = block_state.xyz
        centroid = xyz.mean(dim=-2, keepdim=True)
        block_state.radius_of_gyration = ((xyz - centroid) ** 2).sum(-1).mean().sqrt()
        self.set_block_state(state, block_state)
        return components, state

pipe._blocks.sub_blocks.insert("rog", ComputeRadiusOfGyration(), index=3)
```

## Output Types

| `output_type` | Additional output | Writes to disk |
|---|---|---|
| `"tensor"` | — | — |
| `"pdb"` | `pdb_string` | `.pdb` file |
| `"cif"` | `atom_array`, `atom_array_stack`, `trajectory_stack` | `.cif` via AtomWorks |
| `"cif.gz"` | Same as `"cif"` | `.cif.gz` compressed |

```python
# CIF output with AtomArray
state = pipe(sequence="MKVLSEG...", output_type="cif.gz", output_path="fold_0")
atom_array = state.output.atom_array

# Denoising trajectory
trajectory = state.output.trajectory_stack

# PDB output
state = pipe(sequence="MKVLSEG...", output_type="pdb", output_path="fold_0.pdb")
```

## Model Architecture

RF3 is a diffusion model with the same EDM noise schedule as RFdiffusion3 (200 steps), but conditioned on sequence/MSA/template representations from a large recycling trunk:

| Component | Subfolder | Description |
|-----------|-----------|-------------|
| `transformer` | `transformer/` | `RF3TransformerModel` (366M params) — FeatureInitializer + Recycler (48 pairformer blocks) + DiffusionModule (24 transformer blocks) + DistogramHead |
| `scheduler` | `scheduler/` | `RF3Scheduler` (EDM schedule, gamma_0=0.8) |

## Citation

```bibtex
@article{corley2025accelerating,
    author = {Corley, Nathaniel and Mathis, Simon and Krishna, Rohith and Bauer, Magnus S and Thompson, Tuscan R and Ahern, Woody and Kazman, Maxwell W and Brent, Rafael I and Didi, Kieran and Kubaney, Andrew and others},
    title = {Accelerating biomolecular modeling with AtomWorks and RF3},
    journal = {bioRxiv},
    year = {2025},
}
```