dn6 HF Staff

Upload folder using huggingface_hub

23854f9 verified 12 days ago

7.67 kB

	# Protein Design with Diffusers

	A [diffusers](https://github.com/huggingface/diffusers) `ModularPipeline` wrapper for protein design, combining structure generation ([RFdiffusion3](https://www.biorxiv.org/content/10.1101/2025.09.18.676967v2)) and sequence design ([ProteinMPNN](https://www.science.org/doi/10.1126/science.add2187) / [LigandMPNN](https://www.nature.com/articles/s41592-025-02626-1)) into composable, swappable pipeline blocks.

	All three models — RFD3, ProteinMPNN, and LigandMPNN — rely on [Foundry](https://github.com/RosettaCommons/foundry) for their underlying implementations and [AtomWorks](https://github.com/RosettaCommons/atomworks) for structure I/O. This package adds only the thin wrappers needed for diffusers integration.

	## Getting Started

	### Installation

	```bash
	pip install rc-foundry[all]
	pip install diffusers
	```

	### Running with Diffusers

	```python
	import torch
	from diffusers import ModularPipeline

	pipe = ModularPipeline.from_pretrained("dn6/RFDiffusion-3", trust_remote_code=True)
	pipe.load_components(device_map="cuda", torch_dtype=torch.bfloat16, trust_remote_code=True)

	state = pipe(contigs="100")
	print(state.output.xyz.shape) # [1, 100, 3]
	```

	## Workflows

	The active workflow is selected automatically based on which inputs you provide:

	\| Workflow \| Trigger inputs \| What runs \|
	\|----------\|---------------\|-----------\|
	\| `structure_only` \| `contigs` \| RFdiffusion3 \|
	\| `structure_and_sequence` \| `contigs`, `temperature` \| RFdiffusion3 → MPNN \|
	\| `motif_structure_and_sequence` \| `contigs`, `input_xyz`, `temperature` \| Motif-conditioned RFdiffusion3 → MPNN \|

	### Structure Only

	```python
	state = pipe(contigs="100")
	print(state.output.xyz.shape) # [1, 100, 3]
	```

	### Structure + Sequence Design

	Passing `temperature` triggers the MPNN sequence design step. Load an MPNN variant first:

	```python
	from diffusers import AutoModel

	mpnn = AutoModel.from_pretrained("dn6/RFDiffusion-3", subfolder="mpnn", trust_remote_code=True)
	pipe.update_components(mpnn=mpnn)

	state = pipe(contigs="100", temperature=0.1)
	print(state.mpnn_output.designed_sequence) # e.g. "MKVLSEG..."
	```

	Three MPNN variants are available:

	\| Subfolder \| Variant \| Params \| Description \|
	\|-----------\|---------\|--------\|-------------\|
	\| `mpnn/` \| ProteinMPNN \| 1.66M \| Standard protein sequence design \|
	\| `mpnn_ligand/` \| LigandMPNN \| 2.62M \| Ligand-aware sequence design \|
	\| `mpnn_soluble/` \| SolubleMPNN \| 1.66M \| Optimized for soluble proteins \|

	### Motif-Conditioned Design

	Passing `input_xyz` enables motif conditioning — fix specific residues in place while designing the rest:

	```python
	import torch

	motif_coords = torch.randn(16, 3) # [N_motif, 3]
	state = pipe(
	contigs="A10-25/50",
	input_xyz=motif_coords,
	temperature=0.1,
	)
	```

	### Full Design Pipeline

	The three pipelines can be composed into a complete protein design workflow:

	```
	RFD3 (design backbone) → MPNN (design sequence) → RF3 (validate fold)
	```

	```python
	import torch
	from diffusers import AutoModel, ModularPipeline

	# 1. Design a backbone + sequence
	design_pipe = ModularPipeline.from_pretrained("dn6/RFDiffusion-3", trust_remote_code=True)
	design_pipe.load_components(device_map="cuda", torch_dtype=torch.bfloat16, trust_remote_code=True)

	mpnn = AutoModel.from_pretrained("dn6/RFDiffusion-3", subfolder="mpnn", trust_remote_code=True)
	design_pipe.update_components(mpnn=mpnn)

	state = design_pipe(contigs="100", temperature=0.1, output_type="cif.gz", output_path="design")
	designed_sequence = state.mpnn_output.designed_sequence

	# 2. Validate the fold with RF3
	fold_pipe = ModularPipeline.from_pretrained("dn6/RosettaFold-3", trust_remote_code=True)
	fold_pipe.load_components(device_map="cuda", torch_dtype=torch.bfloat16, trust_remote_code=True)

	state = fold_pipe(sequence=designed_sequence, output_type="cif.gz", output_path="prediction")
	```

	> [RF3](https://www.biorxiv.org/content/10.1101/2025.08.14.670328) (RosettaFold3) is available as a separate pipeline at [`dn6/RosettaFold-3`](https://huggingface.co/dn6/RosettaFold-3).

	## Customizing Workflows

	Inspect, swap, and extend pipeline blocks at runtime:

	```python
	# Inspect the pipeline structure
	print(pipe.blocks)

	# Swap ProteinMPNN for LigandMPNN
	mpnn = AutoModel.from_pretrained("dn6/RFDiffusion-3", subfolder="mpnn_ligand", trust_remote_code=True)
	pipe.update_components(mpnn=mpnn)

	# Select a workflow explicitly
	workflow = pipe.get_workflow("structure_and_sequence")

	# Add a custom block
	from diffusers.modular_pipelines import ModularPipelineBlocks, PipelineState
	from diffusers.modular_pipelines.modular_pipeline_utils import InputParam, OutputParam

	class ScoreDesignStep(ModularPipelineBlocks):
	@property
	def inputs(self):
	return [InputParam("xyz", required=True)]

	@property
	def intermediate_outputs(self):
	return [OutputParam("radius_of_gyration")]

	def __call__(self, components, state):
	block_state = self.get_block_state(state)
	xyz = block_state.xyz
	centroid = xyz.mean(dim=-2, keepdim=True)
	block_state.radius_of_gyration = ((xyz - centroid) ** 2).sum(-1).mean().sqrt()
	self.set_block_state(state, block_state)
	return components, state

	pipe._blocks.sub_blocks.insert("score", ScoreDesignStep(), index=3)
	```

	## Output Types

	\| `output_type` \| Additional output \| Writes to disk \|
	\|---\|---\|---\|
	\| `"tensor"` \| — \| — \|
	\| `"pdb"` \| `pdb_string` \| `.pdb` file \|
	\| `"cif"` \| `atom_array`, `atom_array_stack`, `trajectory_stack` \| `.cif` via AtomWorks \|
	\| `"cif.gz"` \| Same as `"cif"` \| `.cif.gz` compressed \|

	CIF outputs use [AtomWorks](https://github.com/RosettaCommons/atomworks) `to_cif_file` and return [biotite](https://www.biotite-python.org/) `AtomArray` / `AtomArrayStack` objects, matching the foundry output format.

	```python
	# Save as compressed CIF
	state = pipe(contigs="100", output_type="cif.gz", output_path="design_0")

	# Access AtomArray directly
	atom_array = state.output.atom_array

	# Denoising trajectory
	trajectory = state.output.trajectory_stack

	# PDB output
	state = pipe(contigs="100", output_type="pdb", output_path="design_0.pdb")
	print(state.output.pdb_string[:200])
	```

	## Citation

	```bibtex
	@article{butcher2025_rfdiffusion3,
	author = {Butcher, Jasper and Krishna, Rohith and Mitra, Raktim and Brent, Rafael Isaac and Li, Yanjing and Corley, Nathaniel and Kim, Paul T and Funk, Jonathan and Mathis, Simon Valentin and Salike, Saman and Muraishi, Aiko and Eisenach, Helen and Thompson, Tuscan Rock and Chen, Jie and Politanska, Yuliya and Sehgal, Enisha and Coventry, Brian and Zhang, Odin and Qiang, Bo and Didi, Kieran and Kazman, Maxwell and DiMaio, Frank and Baker, David},
	title = {De novo Design of All-atom Biomolecular Interactions with RFdiffusion3},
	journal = {bioRxiv},
	year = {2025},
	doi = {10.1101/2025.09.18.676967},
	}

	@article{dauparas2022robust,
	author = {Dauparas, Justas and Anishchenko, Ivan and Bennett, Nathaniel and Bai, Hua and Ragotte, Robert J and Milles, Lukas F and Wicky, Basile IM and Courbet, Alexis and de Haas, Rob J and Bethel, Neville and others},
	title = {Robust deep learning--based protein sequence design using ProteinMPNN},
	journal = {Science},
	volume = {378},
	number = {6615},
	pages = {49--56},
	year = {2022},
	}

	@article{dauparas2025atomic,
	author = {Dauparas, Justas and Lee, Gyu Rie and Pecoraro, Robert and An, Linna and Anishchenko, Ivan and Glasscock, Cameron and Baker, David},
	title = {Atomic context-conditioned protein sequence design using LigandMPNN},
	journal = {Nature Methods},
	year = {2025},
	}
	```