File size: 5,250 Bytes
f527d7f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
---
license: apache-2.0
pipeline_tag: text-to-image
---

# RecTok: Reconstruction Distillation along Rectified Flow

<div align="center">

[![arXiv](https://img.shields.io/badge/arXiv-RecTok-b31b1b.svg?style=flat-square)](https://arxiv.org/abs/2512.13421)
[![Project Page](https://img.shields.io/badge/Project-Page-blue?style=flat-square)](https://shi-qingyu.github.io/rectok.github.io/)
[![License](https://img.shields.io/badge/License-Apache_2.0-green.svg?style=flat-square)](https://github.com/Shi-qingyu/RecTok/blob/main/LICENSE)

</div>

This repository contains the official PyTorch implementation for the paper [RecTok: Reconstruction Distillation along Rectified Flow](https://huggingface.co/papers/2512.13421).

RecTok addresses the fundamental trade-off between latent space dimensionality and generation quality in visual tokenizers for diffusion models. It proposes two key innovations: flow semantic distillation and reconstruction-alignment distillation. This approach enriches the semantic information in forward flow trajectories, which serve as the training space for diffusion transformers, rather than focusing solely on the latent space. As a result, RecTok achieves superior image reconstruction, generation quality, and discriminative performance, setting state-of-the-art results on gFID-50K and demonstrating consistent improvements with increasing latent dimensionality.

-   **Paper**: [RecTok: Reconstruction Distillation along Rectified Flow](https://huggingface.co/papers/2512.13421)
-   **Project Page**: [https://shi-qingyu.github.io/rectok.github.io/](https://shi-qingyu.github.io/rectok.github.io/)
-   **Code**: [https://github.com/Shi-qingyu/RecTok](https://github.com/Shi-qingyu/RecTok)

<p align="center">
  <img src="https://github.com/Shi-qingyu/RecTok/raw/main/assets/pipeline.png" width="720" alt="RecTok Pipeline">
</p>

## Usage

For detailed instructions on setting up the environment, downloading models, and performing evaluation or training, please refer to the [official GitHub repository](https://github.com/Shi-qingyu/RecTok).

### Installation

Set up the environment and install dependencies:

```bash
# Clone the repository
git clone https://github.com/Shi-qingyu/RecTok.git
cd RecTok

# Create and activate conda environment
conda create -n rectok python=3.10 -y
conda activate rectok

# Install requirements
pip install -r requirements.txt
```

### Download Models

Download pretrained models and necessary data assets:
```bash
# Download from HuggingFace
huggingface-cli download QingyuShi/RecTok --local-dir ./pretrained_models
# Organize data assets and offline models
mv ./pretrained_models/data ./data
mv ./pretrained_models/offline_models.zip ./offline_models.zip
unzip offline_models.zip && rm offline_models.zip
```

### Generative Model Evaluation

Evaluate the generation quality (FID, etc.). You can find the evaluation results in directory `./work_dirs/gen_model_training/RecTok_eval`:

```bash
bash run_eval_diffusion.sh \
    pretrained_models/RecTok_decft.pth \        # path to RecTok checkpoint
    pretrained_models/ditdhxl_epoch_0599.pth \  # path to DiTDH-XL checkpoint
    pretrained_models/ditdhs_epoch_0029.pth     # path to autoguidance model checkpoint
```

Selected examples of class-conditional generation results on ImageNet-1K 256x256:
<p align="center">
  <img src="https://github.com/Shi-qingyu/RecTok/raw/main/assets/qualitative.png" width="1080" alt="RecTok Qualitative Results">
</p>

FID-50k and Inception Score without CFG and with CFG:
|cfg| MAR Model                    | Epochs | FID-50K | Inception Score | #params | 
|---|------------------------------|---------|---------|-----------------|---------|
|1.0| $\text{DiT}^{\text{DH}}\text{-XL}$ + RecTok      | 80      | 2.09    | 198.6           | 839M    |
|1.29| $\text{DiT}^{\text{DH}}\text{-XL}$ + RecTok | 80      | 1.48    | 223.8           | 839M    |
|1.0| $\text{DiT}^{\text{DH}}\text{-XL}$ + RecTok      | 600      | 1.34    | 254.6           | 839M    |
|1.29| $\text{DiT}^{\text{DH}}\text{-XL}$ + RecTok | 600      | 1.13    | 289.2           | 839M    |

## Citation

If you find this work useful for your research, please consider citing:

```bibtex
@article{shi2025rectok,
  title={RecTok: Reconstruction Distillation along Rectified Flow},
  author={Shi, Qingyu and Wu, Size and Bai, Jinbin and Yu, Kaidong and Wang, Yujing and Tong, Yunhai and Li, Xiangtai and Li, Xuelong},
  journal={arXiv preprint arXiv:2512.13421},
  year={2025}
}
```

## Acknowledgements

We thank the authors of [lDeTok](https://github.com/Jiawei-Yang/DeTok), [RAE](https://github.com/bytetriper/RAE), [MAE](https://github.com/facebookresearch/mae), [DiT](https://github.com/facebookresearch/DiT), and [LightningDiT](https://github.com/hustvl/LightningDiT) for their foundational work.

Our codebase builds upon several excellent open-source projects, including [lDeTok](https://github.com/Jiawei-Yang/DeTok), [RAE](https://github.com/bytetriper/RAE), and [torch_fidelity](https://github.com/toshas/torch-fidelity). We are grateful to the communities behind them.

We sincerely thank [Jiawei Yang](https://jiawei-yang.github.io/) and [Boyang Zheng](https://bytetriper.github.io/) for providing insightful feedback.