ZoomLDM-CDM-brca / README.md
BiliSakura's picture
Add files using upload-large-folder tool
a26706b verified
---
license: apache-2.0
library_name: diffusers
pipeline_tag: unconditional-image-generation
tags:
- zoomldm
- cdm
- dit
- histopathology
- brca
- custom-pipeline
widget:
- src: demo_images/input.jpeg
prompt: Sample BRCA conditioning embedding (magnification class 0)
output:
url: demo_images/output.jpeg
---
> [!WARNING] we do not have a full checkpoint conversion validation, if you encounter pipeline loading failure and unsidered output, please contact me via bili_sakura@zju.edu.cn
# BiliSakura/ZoomLDM-CDM-brca
Diffusers-style wrapped **CDM (DiT)** checkpoint for BRCA, converted from ZoomLDM `cdm_dit` training outputs.
## Model Description
- **Architecture:** DiT-B style conditioning diffusion model (CDM)
- **Domain:** BRCA conditioning space used by ZoomLDM
- **Output:** conditioning tokens/embeddings (`(B, 512, 65)`)
- **Format:** custom diffusers pipeline (`pipeline.py`)
## Intended Use
Use this model to sample BRCA conditioning embeddings that can be consumed by downstream ZoomLDM workflows.
## Out-of-Scope Use
- Not a complete pixel-space generator by itself.
- Not intended for clinical or diagnostic use.
- Not validated for non-BRCA domains without adaptation.
## Files
- `pipeline.py`: custom `DiffusionPipeline` implementation (`CDMDiTPipeline`)
- `model_index.json`: diffusers metadata
- `cdm/`: active model weights/config used by pipeline
- `scheduler/`: DDIM scheduler config
- `model_raw.safetensors`: non-EMA training weights (optional)
- `optimizer.pt`: optimizer state (optional)
- `config.json`: conversion metadata
## Usage
```python
import torch
from diffusers import DiffusionPipeline
pipe = DiffusionPipeline.from_pretrained(
"BiliSakura/ZoomLDM-CDM-brca",
custom_pipeline="pipeline.py",
trust_remote_code=True,
).to("cuda")
out = pipe(
batch_size=2,
magnification=torch.tensor([0, 0], device="cuda"), # class labels 0..7
num_inference_steps=50,
guidance_scale=1.0,
)
samples = out.samples # (B, 512, 65)
```
## Limitations
- Produces conditioning embeddings, not final images.
- Requires correct class/magnification label conventions.
- Inherits data biases and quality limits from the original training data.
## Citation
```bibtex
@InProceedings{Yellapragada_2025_CVPR,
author = {Yellapragada, Srikar and Graikos, Alexandros and Triaridis, Kostas and Prasanna, Prateek and Gupta, Rajarsi and Saltz, Joel and Samaras, Dimitris},
title = {ZoomLDM: Latent Diffusion Model for Multi-scale Image Generation},
booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)},
month = {June},
year = {2025},
pages = {23453-23463}
}
@inproceedings{Peebles2023DiT,
title={Scalable Diffusion Models with Transformers},
author={Peebles, William and Xie, Saining},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
year={2023}
}
```