File size: 10,655 Bytes
0ef798b 6bb1e1a 0ef798b 6bb1e1a 0ef798b 6bb1e1a 0ef798b 6bb1e1a 0ef798b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 |
---
license: apache-2.0
tags:
- cryo-em
- flow-matching
- 3d-density-maps
- foundation-model
- conditional-sampling
---
# CryoFM2: A Generative Foundation Model for Cryo-EM Densities
<div align="center">
[](https://doi.org/10.64898/2025.12.29.696802)
[](https://github.com/ByteDance-Seed/cryofm)
[](https://opensource.org/licenses/Apache-2.0)
[](https://bytedance-seed.github.io/cryofm/docs/)
</div>
<div align="center">
<img src="./assets/cryofm2_overview.jpg" alt="CryoFM2 Overview" style="max-width: 100%; height: auto; width: 800px;"/>
</div>
## Overview
**CryoFM2** is a flow-based generative foundation model for cryo-EM density maps.
It is pretrained on curated EMDB half maps to learn general priors of high-quality cryo-EM densities and can be fine-tuned for downstream tasks.
The model learns a continuous mapping from a simple Gaussian distribution to the complex distribution of cryo-EM densities, enabling stable generation and flexible adaptation. CryoFM2 can also act as a **Bayesian prior**, integrating naturally with task-specific likelihoods to support applications such as anisotropy-aware refinement, non-uniform reconstruction, and controlled density modification.
## Model Details
CryoFM2 is pretrained on curated EMDB half maps to learn general priors of high-quality cryo-EM densities. The model can be fine-tuned for various downstream tasks such as density map enhancement and post-processing.
**Pre-training Architecture:**
<div align="center">
<img src="./assets/cryofm2_arch-pretrain.jpg" alt="CryoFM2 architecture for pre-training." style="max-width: 100%; height: auto; width: 800px;"/>
</div>
**Fine-tuning Architecture (for EMhancer/EMReady style post-processing):**
<div align="center">
<img src="./assets/cryofm2_arch-finetune.jpg" alt="CryoFM2 architecture for fine-tuning." style="max-width: 100%; height: auto; width: 800px;"/>
</div>
### Architecture
- **Architecture Type**: 3D UNet
- **Input Size**: 64×64×64 voxels
- **Input Channels**: 2 for pre-trained model, 3 for fine-tuned model
- **Output Channels**: 1
- **Down Blocks**: DownBlock3D, DownBlock3D, AttnDownBlock3D, AttnDownBlock3D
- **Up Blocks**: AttnUpBlock3D, AttnUpBlock3D, UpBlock3D, UpBlock3D
- **Block Output Channels**: (64, 128, 256, 512)
- **Layers per Block**: 2
- **Attention Head Dimension**: 8
- **Normalization**: GroupNorm (32 groups)
- **Activation**: SiLU
- **Time Embedding**: Positional encoding
### Model Variants
1. **cryofm2-pretrain**: Unconditional pretrained model for general density map generation
2. **cryofm2-emhancer**: Fine-tuned model for density map enhancement (EMhancer style)
3. **cryofm2-emready**: Fine-tuned model for density map enhancement (EMReady style)
## Play with CryoFM2
### Installation
Before using CryoFM2, you need to set up the environment and install the package. Follow these steps to get started:
```bash
# Clone the repository
git clone https://github.com/ByteDance-Seed/cryofm.git
cd cryofm
# Create a new conda environment for CryoFM (recommended)
conda create -n cryofm python=3.10 -y
conda activate cryofm
# Install CryoFM
pip install .
```
### Unconditional Generation (Explore Training Data Distribution)
Generate samples from the pretrained model to explore the learned data distribution:
**Pretrained Model:**
```python
import torch
from mmengine import Config
from cryofm.core.utils.mrc_io import save_mrc
from cryofm.core.utils.sampling_fm import sample_from_fm
from cryofm.projects.cryofm2.lit_modules import CryoFM2Uncond
# Update the path to your model directory
model_dir = "path/to/cryofm-v2/cryofm2-pretrain"
cfg = Config.fromfile(f"{model_dir}/config.yaml")
lit_model = CryoFM2Uncond.load_from_safetensors(f"{model_dir}/model.safetensors", cfg=cfg)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
lit_model = lit_model.to(device)
lit_model.eval()
def v_xt_t(_xt, _t):
return lit_model(_xt, _t)
# Enable bfloat16 for faster inference if your GPU supports it
with torch.no_grad(), torch.autocast("cuda", dtype=torch.bfloat16):
out = sample_from_fm(
v_xt_t,
lit_model.noise_scheduler,
method="euler",
num_steps=200,
num_samples=3,
device=lit_model.device,
side_shape=64
)
# Apply normalization if configured
if hasattr(lit_model.cfg, "z_scale") and lit_model.cfg.z_scale.mean is not None:
out = out * lit_model.cfg.z_scale.std + lit_model.cfg.z_scale.mean
# Save generated samples
for i in range(3):
save_mrc(out[i].float().cpu().numpy(), f"sample-{i}.mrc", voxel_size=1.5)
```
**Fine-tuned Models (EMhancer/EMReady):**
```python
import torch
from mmengine import Config
from cryofm.core.utils.mrc_io import save_mrc
from cryofm.core.utils.sampling_fm import sample_from_fm
from cryofm.projects.cryofm2.lit_modules import CryoFM2Cond
# Choose style: "emhancer" or "emready"
style = "emhancer"
model_dir = f"path/to/cryofm-v2/cryofm2-{style}"
cfg = Config.fromfile(f"{model_dir}/config.yaml")
lit_model = CryoFM2Cond.load_from_safetensors(f"{model_dir}/model.safetensors", cfg=cfg)
output_tag = 1 if style == "emhancer" else 0
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
lit_model = lit_model.to(device)
lit_model.eval()
def v_xt_t(_xt, _t):
bs = _xt.shape[0]
unconditional_generation_conds = {
"input_cond": None,
"output_cond": torch.tensor([output_tag] * bs).to(device),
"vol_cond": None, # dimension should be [bs, d, h, w]
}
return lit_model(_xt, _t, generation_conds=unconditional_generation_conds)
# Enable bfloat16 for faster inference if your GPU supports it
with torch.no_grad(), torch.autocast("cuda", dtype=torch.bfloat16):
out = sample_from_fm(
v_xt_t,
lit_model.noise_scheduler,
method="euler",
num_steps=200,
num_samples=3,
device=lit_model.device,
side_shape=64
)
# Apply normalization if configured
if hasattr(lit_model.cfg, "z_scale") and lit_model.cfg.z_scale.mean is not None:
out = out * lit_model.cfg.z_scale.std + lit_model.cfg.z_scale.mean
# Save generated samples
for i in range(3):
save_mrc(out[i].float().cpu().numpy(), f"{style}-sample-{i}.mrc", voxel_size=1.5)
```
### Density Map Modification
CryoFM2 supports various density map modification operations using the pretrained model as a Bayesian prior. Supported operators include:
- **denoise**: Remove noise from density maps
- **inpaint**: Fill missing regions (e.g., missing wedge)
- **denoise inpaint**: Combined denoising and inpainting
- **non-uniform weight**: Apply non-uniform weighting during reconstruction
**Basic Usage:**
```bash
python -m cryofm.projects.cryofm2.uncond_sampling \
-i1 half_map_1.mrc \
-i2 half_map_2.mrc \
-o ./output \
--model-dir path/to/cryofm-v2/cryofm2-pretrain \
--op denoise \
--norm-grad \
--use-lamb-w
```
**For inpainting tasks**, you need to provide a RELION starfile path:
```bash
python -m cryofm.projects.cryofm2.uncond_sampling \
-i1 half_map_1.mrc \
-i2 half_map_2.mrc \
-o ./output \
--model-dir path/to/cryofm-v2/cryofm2-pretrain \
--op inpaint \
--data-starfile-path path/to/relion_data.star \
--norm-grad \
--use-lamb-w
```
### Density Map Post-Processing
CryoFM2 provides fine-tuned models for density map enhancement in different styles, similar to EMhancer and EMReady.
#### EMhancer Style Enhancement
```bash
python -m cryofm.projects.cryofm2.cond_sampling \
-i input_map.mrc \
-o ./output_emhancer \
--model-dir path/to/cryofm-v2/cryofm2-emhancer \
--output-tag 1
```
#### EMReady Style Enhancement
```bash
python -m cryofm.projects.cryofm2.cond_sampling \
-i input_map.mrc \
-o ./output_emready \
--model-dir path/to/cryofm-v2/cryofm2-emready \
--output-tag 0 \
--cfg-weight 0.5
```
**Parameters:**
- `-i`: Input density map file (MRC format)
- `-o`: Output directory
- `--model-dir`: Path to the model directory containing `config.yaml` and `model.safetensors`
- `--output-tag`: Style tag (1 for EMhancer, 0 for EMReady)
- `--cfg-weight`: Classifier-free guidance weight (optional, default varies by model)
## Performance Tips
- **Multi-GPU Inference**: Use `accelerate launch` for faster inference on multiple GPUs:
```bash
NCCL_DEBUG=ERROR accelerate launch --num_processes=${NUM_GPUS} --main_process_port=8881 \
python -m cryofm.projects.cryofm2.cond_sampling ...
```
- **Mixed Precision**: Use `--bf16` flag when available to reduce memory usage and speed up inference.
- **Batch Processing**: Adjust batch size based on your GPU memory capacity.
## Limitations
- Input size is fixed at 64×64×64 voxels
- Model performance may vary depending on the input density map quality
- Fine-tuned models are optimized for specific enhancement styles
## Ethical Considerations
This model is intended for scientific research and structural biology applications. Users should:
- Ensure proper attribution when using generated structures
- Validate generated structures through experimental verification
- Be aware of potential biases in the training data
- Use the model responsibly and in accordance with scientific best practices
## Citation
If you find CryoFM2 useful, please cite:
```bibtex
@article{
Li2025.12.29.696802,
author={Li, Yilai and Yuan, Jing and Zhou, Yi and Wang, Zhenghua and Chen, Suyi and Yang, Fengyu and Ling, Haibin and Kovalsky, Shahar Z and Zheng, Xiaoqing and Gu, Quanquan},
title={A Generative Foundation Model for Cryo-EM Densities},
elocation-id={2025.12.29.696802},
year={2025},
doi={10.64898/2025.12.29.696802},
publisher={Cold Spring Harbor Laboratory},
URL={https://www.biorxiv.org/content/early/2025/12/29/2025.12.29.696802},
eprint={https://www.biorxiv.org/content/early/2025/12/29/2025.12.29.696802.full.pdf},
journal={bioRxiv}
}
```
## License
This model is released under the Apache 2.0 License. See the [LICENSE](https://github.com/ByteDance-Seed/cryofm/blob/main/LICENSE) file for details.
## Acknowledgments
This work is developed by the ByteDance Seed Team. For more information, visit:
- [Project Repository](https://github.com/ByteDance-Seed/cryofm)
- [ByteDance Seed Team](https://seed.bytedance.com/)
|