# Protein Structure Prediction with Diffusers A [diffusers](https://github.com/huggingface/diffusers) `ModularPipeline` wrapper for [RosettaFold3](https://doi.org/10.1101/2025.08.14.670328) (RF3) — a diffusion-based protein structure prediction model that predicts 3D atomic coordinates from amino acid sequences. RF3 relies on [Foundry](https://github.com/RosettaCommons/foundry) for its underlying implementation and [AtomWorks](https://github.com/RosettaCommons/atomworks) for structure I/O. This package adds only the thin wrappers needed for diffusers integration. ## Getting Started ### Installation ```bash pip install rc-foundry[all] pip install diffusers ``` ### Running with Diffusers ```python import torch from diffusers import ModularPipeline pipe = ModularPipeline.from_pretrained("dn6/RosettaFold-3", trust_remote_code=True) pipe.load_components(device_map="cuda", torch_dtype=torch.bfloat16, trust_remote_code=True) state = pipe(sequence="MKVLSEGDPWRK...") print(state.output.xyz.shape) # [D, L, 3] ``` ## Workflows | Workflow | Trigger inputs | What runs | |----------|---------------|-----------| | `fold` | `sequence` | Full structure prediction (recycling trunk + diffusion) | ### Fold a Sequence ```python state = pipe(sequence="MKVLSEGDPWRK...", output_type="cif.gz", output_path="prediction") print(state.output.atom_array) ``` ### Full Design Pipeline RF3 is typically used as a validation step after backbone design with [RFdiffusion3](https://huggingface.co/dn6/RFDiffusion-3): ``` RFD3 (design backbone) → MPNN (design sequence) → RF3 (validate fold) ``` ```python import torch from diffusers import AutoModel, ModularPipeline # 1. Design a backbone + sequence design_pipe = ModularPipeline.from_pretrained("dn6/RFDiffusion-3", trust_remote_code=True) design_pipe.load_components(device_map="cuda", torch_dtype=torch.bfloat16, trust_remote_code=True) mpnn = AutoModel.from_pretrained("dn6/RFDiffusion-3", subfolder="mpnn", trust_remote_code=True) design_pipe.update_components(mpnn=mpnn) state = design_pipe(contigs="100", temperature=0.1) designed_sequence = state.mpnn_output.designed_sequence # 2. Validate the fold fold_pipe = ModularPipeline.from_pretrained("dn6/RosettaFold-3", trust_remote_code=True) fold_pipe.load_components(device_map="cuda", torch_dtype=torch.bfloat16, trust_remote_code=True) state = fold_pipe(sequence=designed_sequence, output_type="cif.gz", output_path="prediction") ``` ## Customizing Workflows ```python # Inspect the pipeline structure print(pipe.blocks) # Add a custom block from diffusers.modular_pipelines import ModularPipelineBlocks, PipelineState from diffusers.modular_pipelines.modular_pipeline_utils import InputParam, OutputParam class ComputeRadiusOfGyration(ModularPipelineBlocks): @property def inputs(self): return [InputParam("xyz", required=True)] @property def intermediate_outputs(self): return [OutputParam("radius_of_gyration")] def __call__(self, components, state): block_state = self.get_block_state(state) xyz = block_state.xyz centroid = xyz.mean(dim=-2, keepdim=True) block_state.radius_of_gyration = ((xyz - centroid) ** 2).sum(-1).mean().sqrt() self.set_block_state(state, block_state) return components, state pipe._blocks.sub_blocks.insert("rog", ComputeRadiusOfGyration(), index=3) ``` ## Output Types | `output_type` | Additional output | Writes to disk | |---|---|---| | `"tensor"` | — | — | | `"pdb"` | `pdb_string` | `.pdb` file | | `"cif"` | `atom_array`, `atom_array_stack`, `trajectory_stack` | `.cif` via AtomWorks | | `"cif.gz"` | Same as `"cif"` | `.cif.gz` compressed | ```python # CIF output with AtomArray state = pipe(sequence="MKVLSEG...", output_type="cif.gz", output_path="fold_0") atom_array = state.output.atom_array # Denoising trajectory trajectory = state.output.trajectory_stack # PDB output state = pipe(sequence="MKVLSEG...", output_type="pdb", output_path="fold_0.pdb") ``` ## Model Architecture RF3 is a diffusion model with the same EDM noise schedule as RFdiffusion3 (200 steps), but conditioned on sequence/MSA/template representations from a large recycling trunk: | Component | Subfolder | Description | |-----------|-----------|-------------| | `transformer` | `transformer/` | `RF3TransformerModel` (366M params) — FeatureInitializer + Recycler (48 pairformer blocks) + DiffusionModule (24 transformer blocks) + DistogramHead | | `scheduler` | `scheduler/` | `RF3Scheduler` (EDM schedule, gamma_0=0.8) | ## Citation ```bibtex @article{corley2025accelerating, author = {Corley, Nathaniel and Mathis, Simon and Krishna, Rohith and Bauer, Magnus S and Thompson, Tuscan R and Ahern, Woody and Kazman, Maxwell W and Brent, Rafael I and Didi, Kieran and Kubaney, Andrew and others}, title = {Accelerating biomolecular modeling with AtomWorks and RF3}, journal = {bioRxiv}, year = {2025}, } ```