File size: 2,079 Bytes
f9945e2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
---
license: apache-2.0
tags:
  - medical-imaging
  - ct-generation
  - flow-matching
  - diffusion
  - text-to-3d
  - auto-regressive
---

# CTFlow: Video-Inspired Latent Flow Matching for 3D CT Synthesis

**ICCV 2025 Workshop on Vision-Language Models for 3D Understanding (VLM3D)**

[[Paper]](https://openaccess.thecvf.com/content/ICCV2025W/VLM3D/papers/Wang_CTFlow_Video-Inspired_Latent_Flow_Matching_for_3D_CT_Synthesis_ICCVW_2025_paper.pdf) | [[GitHub]](https://github.com/WongJiayi/CTFlow)

---

## Overview

CTFlow is a **0.5B latent flow matching transformer** for generating entire 3D CT volumes conditioned on clinical reports.

Key ideas:
- Uses the **FLUX A-VAE** as the latent space encoder/decoder
- Encodes clinical reports with the **CT-CLIP text encoder**
- Generates CT volumes **auto-regressively block-by-block**, keeping memory tractable while maintaining temporal coherence across slices
- Trained on **CT-RATE**, a large-scale dataset of 3D CT volumes paired with clinical reports

---

## Checkpoint

This repository contains the pretrained **STDiT-L2** checkpoint (512M parameters, trained for 680,000 steps):

```
checkpoint-680000/
└── denoiser_ema/     ← use this for inference
```

---

## Usage

See the [GitHub repository](https://github.com/WongJiayi/CTFlow) for full installation instructions, training configs, and inference scripts.

**Quick inference:**

```bash
git clone https://github.com/WongJiayi/CTFlow
cd CTFlow

python auto_regressive_generate/main.py \
    --config /path/to/config.yaml \
    --ckpt /path/to/checkpoint-680000/denoiser_ema \
    --embedding /path/to/ct_embedding.pt \
    --output output_frames/ \
    --type full-body
```

---

## Citation

```bibtex
@InProceedings{Wang_2025_ICCVW,
    author    = {Wang, Jiayi and Reynaud, Hadrien and Erick, Franciskus Xaverius and Kainz, Bernhard},
    title     = {CTFlow: Video-Inspired Latent Flow Matching for 3D CT Synthesis},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops},
    year      = {2025},
}
```