dn6 HF Staff

Upload folder using huggingface_hub

a376829 verified 17 days ago

4.97 kB

	# Protein Structure Prediction with Diffusers

	A [diffusers](https://github.com/huggingface/diffusers) `ModularPipeline` wrapper for [RosettaFold3](https://doi.org/10.1101/2025.08.14.670328) (RF3) — a diffusion-based protein structure prediction model that predicts 3D atomic coordinates from amino acid sequences.

	RF3 relies on [Foundry](https://github.com/RosettaCommons/foundry) for its underlying implementation and [AtomWorks](https://github.com/RosettaCommons/atomworks) for structure I/O. This package adds only the thin wrappers needed for diffusers integration.

	## Getting Started

	### Installation

	```bash
	pip install rc-foundry[all]
	pip install diffusers
	```

	### Running with Diffusers

	```python
	import torch
	from diffusers import ModularPipeline

	pipe = ModularPipeline.from_pretrained("dn6/RosettaFold-3", trust_remote_code=True)
	pipe.load_components(device_map="cuda", torch_dtype=torch.bfloat16, trust_remote_code=True)

	state = pipe(sequence="MKVLSEGDPWRK...")
	print(state.output.xyz.shape) # [D, L, 3]
	```

	## Workflows

	\| Workflow \| Trigger inputs \| What runs \|
	\|----------\|---------------\|-----------\|
	\| `fold` \| `sequence` \| Full structure prediction (recycling trunk + diffusion) \|

	### Fold a Sequence

	```python
	state = pipe(sequence="MKVLSEGDPWRK...", output_type="cif.gz", output_path="prediction")
	print(state.output.atom_array)
	```

	### Full Design Pipeline

	RF3 is typically used as a validation step after backbone design with [RFdiffusion3](https://huggingface.co/dn6/RFDiffusion-3):

	```
	RFD3 (design backbone) → MPNN (design sequence) → RF3 (validate fold)
	```

	```python
	import torch
	from diffusers import AutoModel, ModularPipeline

	# 1. Design a backbone + sequence
	design_pipe = ModularPipeline.from_pretrained("dn6/RFDiffusion-3", trust_remote_code=True)
	design_pipe.load_components(device_map="cuda", torch_dtype=torch.bfloat16, trust_remote_code=True)

	mpnn = AutoModel.from_pretrained("dn6/RFDiffusion-3", subfolder="mpnn", trust_remote_code=True)
	design_pipe.update_components(mpnn=mpnn)

	state = design_pipe(contigs="100", temperature=0.1)
	designed_sequence = state.mpnn_output.designed_sequence

	# 2. Validate the fold
	fold_pipe = ModularPipeline.from_pretrained("dn6/RosettaFold-3", trust_remote_code=True)
	fold_pipe.load_components(device_map="cuda", torch_dtype=torch.bfloat16, trust_remote_code=True)

	state = fold_pipe(sequence=designed_sequence, output_type="cif.gz", output_path="prediction")
	```

	## Customizing Workflows

	```python
	# Inspect the pipeline structure
	print(pipe.blocks)

	# Add a custom block
	from diffusers.modular_pipelines import ModularPipelineBlocks, PipelineState
	from diffusers.modular_pipelines.modular_pipeline_utils import InputParam, OutputParam

	class ComputeRadiusOfGyration(ModularPipelineBlocks):
	@property
	def inputs(self):
	return [InputParam("xyz", required=True)]

	@property
	def intermediate_outputs(self):
	return [OutputParam("radius_of_gyration")]

	def __call__(self, components, state):
	block_state = self.get_block_state(state)
	xyz = block_state.xyz
	centroid = xyz.mean(dim=-2, keepdim=True)
	block_state.radius_of_gyration = ((xyz - centroid) ** 2).sum(-1).mean().sqrt()
	self.set_block_state(state, block_state)
	return components, state

	pipe._blocks.sub_blocks.insert("rog", ComputeRadiusOfGyration(), index=3)
	```

	## Output Types

	\| `output_type` \| Additional output \| Writes to disk \|
	\|---\|---\|---\|
	\| `"tensor"` \| — \| — \|
	\| `"pdb"` \| `pdb_string` \| `.pdb` file \|
	\| `"cif"` \| `atom_array`, `atom_array_stack`, `trajectory_stack` \| `.cif` via AtomWorks \|
	\| `"cif.gz"` \| Same as `"cif"` \| `.cif.gz` compressed \|

	```python
	# CIF output with AtomArray
	state = pipe(sequence="MKVLSEG...", output_type="cif.gz", output_path="fold_0")
	atom_array = state.output.atom_array

	# Denoising trajectory
	trajectory = state.output.trajectory_stack

	# PDB output
	state = pipe(sequence="MKVLSEG...", output_type="pdb", output_path="fold_0.pdb")
	```

	## Model Architecture

	RF3 is a diffusion model with the same EDM noise schedule as RFdiffusion3 (200 steps), but conditioned on sequence/MSA/template representations from a large recycling trunk:

	\| Component \| Subfolder \| Description \|
	\|-----------\|-----------\|-------------\|
	\| `transformer` \| `transformer/` \| `RF3TransformerModel` (366M params) — FeatureInitializer + Recycler (48 pairformer blocks) + DiffusionModule (24 transformer blocks) + DistogramHead \|
	\| `scheduler` \| `scheduler/` \| `RF3Scheduler` (EDM schedule, gamma_0=0.8) \|

	## Citation

	```bibtex
	@article{corley2025accelerating,
	author = {Corley, Nathaniel and Mathis, Simon and Krishna, Rohith and Bauer, Magnus S and Thompson, Tuscan R and Ahern, Woody and Kazman, Maxwell W and Brent, Rafael I and Didi, Kieran and Kubaney, Andrew and others},
	title = {Accelerating biomolecular modeling with AtomWorks and RF3},
	journal = {bioRxiv},
	year = {2025},
	}
	```