# Protein Design with Diffusers A [diffusers](https://github.com/huggingface/diffusers) `ModularPipeline` wrapper for protein design, combining structure generation ([RFdiffusion3](https://www.biorxiv.org/content/10.1101/2025.09.18.676967v2)) and sequence design ([ProteinMPNN](https://www.science.org/doi/10.1126/science.add2187) / [LigandMPNN](https://www.nature.com/articles/s41592-025-02626-1)) into composable, swappable pipeline blocks. All three models — RFD3, ProteinMPNN, and LigandMPNN — rely on [Foundry](https://github.com/RosettaCommons/foundry) for their underlying implementations and [AtomWorks](https://github.com/RosettaCommons/atomworks) for structure I/O. This package adds only the thin wrappers needed for diffusers integration. ## Getting Started ### Installation ```bash pip install rc-foundry[all] pip install diffusers ``` ### Running with Diffusers ```python import torch from diffusers import ModularPipeline pipe = ModularPipeline.from_pretrained("dn6/RFDiffusion-3", trust_remote_code=True) pipe.load_components(device_map="cuda", torch_dtype=torch.bfloat16, trust_remote_code=True) state = pipe(contigs="100") print(state.output.xyz.shape) # [1, 100, 3] ``` ## Workflows The active workflow is selected automatically based on which inputs you provide: | Workflow | Trigger inputs | What runs | |----------|---------------|-----------| | `structure_only` | `contigs` | RFdiffusion3 | | `structure_and_sequence` | `contigs`, `temperature` | RFdiffusion3 → MPNN | | `motif_structure_and_sequence` | `contigs`, `input_xyz`, `temperature` | Motif-conditioned RFdiffusion3 → MPNN | ### Structure Only ```python state = pipe(contigs="100") print(state.output.xyz.shape) # [1, 100, 3] ``` ### Structure + Sequence Design Passing `temperature` triggers the MPNN sequence design step. Load an MPNN variant first: ```python from diffusers import AutoModel mpnn = AutoModel.from_pretrained("dn6/RFDiffusion-3", subfolder="mpnn", trust_remote_code=True) pipe.update_components(mpnn=mpnn) state = pipe(contigs="100", temperature=0.1) print(state.mpnn_output.designed_sequence) # e.g. "MKVLSEG..." ``` Three MPNN variants are available: | Subfolder | Variant | Params | Description | |-----------|---------|--------|-------------| | `mpnn/` | ProteinMPNN | 1.66M | Standard protein sequence design | | `mpnn_ligand/` | LigandMPNN | 2.62M | Ligand-aware sequence design | | `mpnn_soluble/` | SolubleMPNN | 1.66M | Optimized for soluble proteins | ### Motif-Conditioned Design Passing `input_xyz` enables motif conditioning — fix specific residues in place while designing the rest: ```python import torch motif_coords = torch.randn(16, 3) # [N_motif, 3] state = pipe( contigs="A10-25/50", input_xyz=motif_coords, temperature=0.1, ) ``` ### Full Design Pipeline The three pipelines can be composed into a complete protein design workflow: ``` RFD3 (design backbone) → MPNN (design sequence) → RF3 (validate fold) ``` ```python import torch from diffusers import AutoModel, ModularPipeline # 1. Design a backbone + sequence design_pipe = ModularPipeline.from_pretrained("dn6/RFDiffusion-3", trust_remote_code=True) design_pipe.load_components(device_map="cuda", torch_dtype=torch.bfloat16, trust_remote_code=True) mpnn = AutoModel.from_pretrained("dn6/RFDiffusion-3", subfolder="mpnn", trust_remote_code=True) design_pipe.update_components(mpnn=mpnn) state = design_pipe(contigs="100", temperature=0.1, output_type="cif.gz", output_path="design") designed_sequence = state.mpnn_output.designed_sequence # 2. Validate the fold with RF3 fold_pipe = ModularPipeline.from_pretrained("dn6/RosettaFold-3", trust_remote_code=True) fold_pipe.load_components(device_map="cuda", torch_dtype=torch.bfloat16, trust_remote_code=True) state = fold_pipe(sequence=designed_sequence, output_type="cif.gz", output_path="prediction") ``` > [RF3](https://www.biorxiv.org/content/10.1101/2025.08.14.670328) (RosettaFold3) is available as a separate pipeline at [`dn6/RosettaFold-3`](https://huggingface.co/dn6/RosettaFold-3). ## Customizing Workflows Inspect, swap, and extend pipeline blocks at runtime: ```python # Inspect the pipeline structure print(pipe.blocks) # Swap ProteinMPNN for LigandMPNN mpnn = AutoModel.from_pretrained("dn6/RFDiffusion-3", subfolder="mpnn_ligand", trust_remote_code=True) pipe.update_components(mpnn=mpnn) # Select a workflow explicitly workflow = pipe.get_workflow("structure_and_sequence") # Add a custom block from diffusers.modular_pipelines import ModularPipelineBlocks, PipelineState from diffusers.modular_pipelines.modular_pipeline_utils import InputParam, OutputParam class ScoreDesignStep(ModularPipelineBlocks): @property def inputs(self): return [InputParam("xyz", required=True)] @property def intermediate_outputs(self): return [OutputParam("radius_of_gyration")] def __call__(self, components, state): block_state = self.get_block_state(state) xyz = block_state.xyz centroid = xyz.mean(dim=-2, keepdim=True) block_state.radius_of_gyration = ((xyz - centroid) ** 2).sum(-1).mean().sqrt() self.set_block_state(state, block_state) return components, state pipe._blocks.sub_blocks.insert("score", ScoreDesignStep(), index=3) ``` ## Output Types | `output_type` | Additional output | Writes to disk | |---|---|---| | `"tensor"` | — | — | | `"pdb"` | `pdb_string` | `.pdb` file | | `"cif"` | `atom_array`, `atom_array_stack`, `trajectory_stack` | `.cif` via AtomWorks | | `"cif.gz"` | Same as `"cif"` | `.cif.gz` compressed | CIF outputs use [AtomWorks](https://github.com/RosettaCommons/atomworks) `to_cif_file` and return [biotite](https://www.biotite-python.org/) `AtomArray` / `AtomArrayStack` objects, matching the foundry output format. ```python # Save as compressed CIF state = pipe(contigs="100", output_type="cif.gz", output_path="design_0") # Access AtomArray directly atom_array = state.output.atom_array # Denoising trajectory trajectory = state.output.trajectory_stack # PDB output state = pipe(contigs="100", output_type="pdb", output_path="design_0.pdb") print(state.output.pdb_string[:200]) ``` ## Citation ```bibtex @article{butcher2025_rfdiffusion3, author = {Butcher, Jasper and Krishna, Rohith and Mitra, Raktim and Brent, Rafael Isaac and Li, Yanjing and Corley, Nathaniel and Kim, Paul T and Funk, Jonathan and Mathis, Simon Valentin and Salike, Saman and Muraishi, Aiko and Eisenach, Helen and Thompson, Tuscan Rock and Chen, Jie and Politanska, Yuliya and Sehgal, Enisha and Coventry, Brian and Zhang, Odin and Qiang, Bo and Didi, Kieran and Kazman, Maxwell and DiMaio, Frank and Baker, David}, title = {De novo Design of All-atom Biomolecular Interactions with RFdiffusion3}, journal = {bioRxiv}, year = {2025}, doi = {10.1101/2025.09.18.676967}, } @article{dauparas2022robust, author = {Dauparas, Justas and Anishchenko, Ivan and Bennett, Nathaniel and Bai, Hua and Ragotte, Robert J and Milles, Lukas F and Wicky, Basile IM and Courbet, Alexis and de Haas, Rob J and Bethel, Neville and others}, title = {Robust deep learning--based protein sequence design using ProteinMPNN}, journal = {Science}, volume = {378}, number = {6615}, pages = {49--56}, year = {2022}, } @article{dauparas2025atomic, author = {Dauparas, Justas and Lee, Gyu Rie and Pecoraro, Robert and An, Linna and Anishchenko, Ivan and Glasscock, Cameron and Baker, David}, title = {Atomic context-conditioned protein sequence design using LigandMPNN}, journal = {Nature Methods}, year = {2025}, } ```