| --- |
| license: apache-2.0 |
| tags: |
| - medical-imaging |
| - ct-generation |
| - flow-matching |
| - diffusion |
| - text-to-3d |
| - auto-regressive |
| --- |
| |
| # CTFlow: Video-Inspired Latent Flow Matching for 3D CT Synthesis |
|
|
| **ICCV 2025 Workshop on Vision-Language Models for 3D Understanding (VLM3D)** |
|
|
| [[Paper]](https://openaccess.thecvf.com/content/ICCV2025W/VLM3D/papers/Wang_CTFlow_Video-Inspired_Latent_Flow_Matching_for_3D_CT_Synthesis_ICCVW_2025_paper.pdf) | [[GitHub]](https://github.com/WongJiayi/CTFlow) |
|
|
| --- |
|
|
| ## Overview |
|
|
| CTFlow is a **0.5B latent flow matching transformer** for generating entire 3D CT volumes conditioned on clinical reports. |
|
|
| Key ideas: |
| - Uses the **FLUX A-VAE** as the latent space encoder/decoder |
| - Encodes clinical reports with the **CT-CLIP text encoder** |
| - Generates CT volumes **auto-regressively block-by-block**, keeping memory tractable while maintaining temporal coherence across slices |
| - Trained on **CT-RATE**, a large-scale dataset of 3D CT volumes paired with clinical reports |
|
|
| --- |
|
|
| ## Checkpoint |
|
|
| This repository contains the pretrained **STDiT-L2** checkpoint (512M parameters, trained for 680,000 steps): |
|
|
| ``` |
| checkpoint-680000/ |
| └── denoiser_ema/ ← use this for inference |
| ``` |
|
|
| --- |
|
|
| ## Usage |
|
|
| See the [GitHub repository](https://github.com/WongJiayi/CTFlow) for full installation instructions, training configs, and inference scripts. |
|
|
| **Quick inference:** |
|
|
| ```bash |
| git clone https://github.com/WongJiayi/CTFlow |
| cd CTFlow |
| |
| python auto_regressive_generate/main.py \ |
| --config /path/to/config.yaml \ |
| --ckpt /path/to/checkpoint-680000/denoiser_ema \ |
| --embedding /path/to/ct_embedding.pt \ |
| --output output_frames/ \ |
| --type full-body |
| ``` |
|
|
| --- |
|
|
| ## Citation |
|
|
| ```bibtex |
| @InProceedings{Wang_2025_ICCVW, |
| author = {Wang, Jiayi and Reynaud, Hadrien and Erick, Franciskus Xaverius and Kainz, Bernhard}, |
| title = {CTFlow: Video-Inspired Latent Flow Matching for 3D CT Synthesis}, |
| booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, |
| year = {2025}, |
| } |
| ``` |
|
|