RosettaFold-3 / README.md
dn6's picture
dn6 HF Staff
Upload folder using huggingface_hub
a376829 verified
# Protein Structure Prediction with Diffusers
A [diffusers](https://github.com/huggingface/diffusers) `ModularPipeline` wrapper for [RosettaFold3](https://doi.org/10.1101/2025.08.14.670328) (RF3) — a diffusion-based protein structure prediction model that predicts 3D atomic coordinates from amino acid sequences.
RF3 relies on [Foundry](https://github.com/RosettaCommons/foundry) for its underlying implementation and [AtomWorks](https://github.com/RosettaCommons/atomworks) for structure I/O. This package adds only the thin wrappers needed for diffusers integration.
## Getting Started
### Installation
```bash
pip install rc-foundry[all]
pip install diffusers
```
### Running with Diffusers
```python
import torch
from diffusers import ModularPipeline
pipe = ModularPipeline.from_pretrained("dn6/RosettaFold-3", trust_remote_code=True)
pipe.load_components(device_map="cuda", torch_dtype=torch.bfloat16, trust_remote_code=True)
state = pipe(sequence="MKVLSEGDPWRK...")
print(state.output.xyz.shape) # [D, L, 3]
```
## Workflows
| Workflow | Trigger inputs | What runs |
|----------|---------------|-----------|
| `fold` | `sequence` | Full structure prediction (recycling trunk + diffusion) |
### Fold a Sequence
```python
state = pipe(sequence="MKVLSEGDPWRK...", output_type="cif.gz", output_path="prediction")
print(state.output.atom_array)
```
### Full Design Pipeline
RF3 is typically used as a validation step after backbone design with [RFdiffusion3](https://huggingface.co/dn6/RFDiffusion-3):
```
RFD3 (design backbone) → MPNN (design sequence) → RF3 (validate fold)
```
```python
import torch
from diffusers import AutoModel, ModularPipeline
# 1. Design a backbone + sequence
design_pipe = ModularPipeline.from_pretrained("dn6/RFDiffusion-3", trust_remote_code=True)
design_pipe.load_components(device_map="cuda", torch_dtype=torch.bfloat16, trust_remote_code=True)
mpnn = AutoModel.from_pretrained("dn6/RFDiffusion-3", subfolder="mpnn", trust_remote_code=True)
design_pipe.update_components(mpnn=mpnn)
state = design_pipe(contigs="100", temperature=0.1)
designed_sequence = state.mpnn_output.designed_sequence
# 2. Validate the fold
fold_pipe = ModularPipeline.from_pretrained("dn6/RosettaFold-3", trust_remote_code=True)
fold_pipe.load_components(device_map="cuda", torch_dtype=torch.bfloat16, trust_remote_code=True)
state = fold_pipe(sequence=designed_sequence, output_type="cif.gz", output_path="prediction")
```
## Customizing Workflows
```python
# Inspect the pipeline structure
print(pipe.blocks)
# Add a custom block
from diffusers.modular_pipelines import ModularPipelineBlocks, PipelineState
from diffusers.modular_pipelines.modular_pipeline_utils import InputParam, OutputParam
class ComputeRadiusOfGyration(ModularPipelineBlocks):
@property
def inputs(self):
return [InputParam("xyz", required=True)]
@property
def intermediate_outputs(self):
return [OutputParam("radius_of_gyration")]
def __call__(self, components, state):
block_state = self.get_block_state(state)
xyz = block_state.xyz
centroid = xyz.mean(dim=-2, keepdim=True)
block_state.radius_of_gyration = ((xyz - centroid) ** 2).sum(-1).mean().sqrt()
self.set_block_state(state, block_state)
return components, state
pipe._blocks.sub_blocks.insert("rog", ComputeRadiusOfGyration(), index=3)
```
## Output Types
| `output_type` | Additional output | Writes to disk |
|---|---|---|
| `"tensor"` | — | — |
| `"pdb"` | `pdb_string` | `.pdb` file |
| `"cif"` | `atom_array`, `atom_array_stack`, `trajectory_stack` | `.cif` via AtomWorks |
| `"cif.gz"` | Same as `"cif"` | `.cif.gz` compressed |
```python
# CIF output with AtomArray
state = pipe(sequence="MKVLSEG...", output_type="cif.gz", output_path="fold_0")
atom_array = state.output.atom_array
# Denoising trajectory
trajectory = state.output.trajectory_stack
# PDB output
state = pipe(sequence="MKVLSEG...", output_type="pdb", output_path="fold_0.pdb")
```
## Model Architecture
RF3 is a diffusion model with the same EDM noise schedule as RFdiffusion3 (200 steps), but conditioned on sequence/MSA/template representations from a large recycling trunk:
| Component | Subfolder | Description |
|-----------|-----------|-------------|
| `transformer` | `transformer/` | `RF3TransformerModel` (366M params) — FeatureInitializer + Recycler (48 pairformer blocks) + DiffusionModule (24 transformer blocks) + DistogramHead |
| `scheduler` | `scheduler/` | `RF3Scheduler` (EDM schedule, gamma_0=0.8) |
## Citation
```bibtex
@article{corley2025accelerating,
author = {Corley, Nathaniel and Mathis, Simon and Krishna, Rohith and Bauer, Magnus S and Thompson, Tuscan R and Ahern, Woody and Kazman, Maxwell W and Brent, Rafael I and Didi, Kieran and Kubaney, Andrew and others},
title = {Accelerating biomolecular modeling with AtomWorks and RF3},
journal = {bioRxiv},
year = {2025},
}
```