--- license: apache-2.0 tags: - medical-imaging - ct-generation - flow-matching - diffusion - text-to-3d - auto-regressive --- # CTFlow: Video-Inspired Latent Flow Matching for 3D CT Synthesis **ICCV 2025 Workshop on Vision-Language Models for 3D Understanding (VLM3D)** [[Paper]](https://openaccess.thecvf.com/content/ICCV2025W/VLM3D/papers/Wang_CTFlow_Video-Inspired_Latent_Flow_Matching_for_3D_CT_Synthesis_ICCVW_2025_paper.pdf) | [[GitHub]](https://github.com/WongJiayi/CTFlow) --- ## Overview CTFlow is a **0.5B latent flow matching transformer** for generating entire 3D CT volumes conditioned on clinical reports. Key ideas: - Uses the **FLUX A-VAE** as the latent space encoder/decoder - Encodes clinical reports with the **CT-CLIP text encoder** - Generates CT volumes **auto-regressively block-by-block**, keeping memory tractable while maintaining temporal coherence across slices - Trained on **CT-RATE**, a large-scale dataset of 3D CT volumes paired with clinical reports --- ## Checkpoint This repository contains the pretrained **STDiT-L2** checkpoint (512M parameters, trained for 680,000 steps): ``` checkpoint-680000/ └── denoiser_ema/ ← use this for inference ``` --- ## Usage See the [GitHub repository](https://github.com/WongJiayi/CTFlow) for full installation instructions, training configs, and inference scripts. **Quick inference:** ```bash git clone https://github.com/WongJiayi/CTFlow cd CTFlow python auto_regressive_generate/main.py \ --config /path/to/config.yaml \ --ckpt /path/to/checkpoint-680000/denoiser_ema \ --embedding /path/to/ct_embedding.pt \ --output output_frames/ \ --type full-body ``` --- ## Citation ```bibtex @InProceedings{Wang_2025_ICCVW, author = {Wang, Jiayi and Reynaud, Hadrien and Erick, Franciskus Xaverius and Kainz, Bernhard}, title = {CTFlow: Video-Inspired Latent Flow Matching for 3D CT Synthesis}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, year = {2025}, } ```