license: apache-2.0
pipeline_tag: text-to-image
RecTok: Reconstruction Distillation along Rectified Flow
This repository contains the official PyTorch implementation for the paper RecTok: Reconstruction Distillation along Rectified Flow.
RecTok addresses the fundamental trade-off between latent space dimensionality and generation quality in visual tokenizers for diffusion models. It proposes two key innovations: flow semantic distillation and reconstruction-alignment distillation. This approach enriches the semantic information in forward flow trajectories, which serve as the training space for diffusion transformers, rather than focusing solely on the latent space. As a result, RecTok achieves superior image reconstruction, generation quality, and discriminative performance, setting state-of-the-art results on gFID-50K and demonstrating consistent improvements with increasing latent dimensionality.
- Paper: RecTok: Reconstruction Distillation along Rectified Flow
- Project Page: https://shi-qingyu.github.io/rectok.github.io/
- Code: https://github.com/Shi-qingyu/RecTok
Usage
For detailed instructions on setting up the environment, downloading models, and performing evaluation or training, please refer to the official GitHub repository.
Installation
Set up the environment and install dependencies:
# Clone the repository
git clone https://github.com/Shi-qingyu/RecTok.git
cd RecTok
# Create and activate conda environment
conda create -n rectok python=3.10 -y
conda activate rectok
# Install requirements
pip install -r requirements.txt
Download Models
Download pretrained models and necessary data assets:
# Download from HuggingFace
huggingface-cli download QingyuShi/RecTok --local-dir ./pretrained_models
# Organize data assets and offline models
mv ./pretrained_models/data ./data
mv ./pretrained_models/offline_models.zip ./offline_models.zip
unzip offline_models.zip && rm offline_models.zip
Generative Model Evaluation
Evaluate the generation quality (FID, etc.). You can find the evaluation results in directory ./work_dirs/gen_model_training/RecTok_eval:
bash run_eval_diffusion.sh \
pretrained_models/RecTok_decft.pth \ # path to RecTok checkpoint
pretrained_models/ditdhxl_epoch_0599.pth \ # path to DiTDH-XL checkpoint
pretrained_models/ditdhs_epoch_0029.pth # path to autoguidance model checkpoint
Selected examples of class-conditional generation results on ImageNet-1K 256x256:
FID-50k and Inception Score without CFG and with CFG:
| cfg | MAR Model | Epochs | FID-50K | Inception Score | #params |
|---|---|---|---|---|---|
| 1.0 | $\text{DiT}^{\text{DH}}\text{-XL}$ + RecTok | 80 | 2.09 | 198.6 | 839M |
| 1.29 | $\text{DiT}^{\text{DH}}\text{-XL}$ + RecTok | 80 | 1.48 | 223.8 | 839M |
| 1.0 | $\text{DiT}^{\text{DH}}\text{-XL}$ + RecTok | 600 | 1.34 | 254.6 | 839M |
| 1.29 | $\text{DiT}^{\text{DH}}\text{-XL}$ + RecTok | 600 | 1.13 | 289.2 | 839M |
Citation
If you find this work useful for your research, please consider citing:
@article{shi2025rectok,
title={RecTok: Reconstruction Distillation along Rectified Flow},
author={Shi, Qingyu and Wu, Size and Bai, Jinbin and Yu, Kaidong and Wang, Yujing and Tong, Yunhai and Li, Xiangtai and Li, Xuelong},
journal={arXiv preprint arXiv:2512.13421},
year={2025}
}
Acknowledgements
We thank the authors of lDeTok, RAE, MAE, DiT, and LightningDiT for their foundational work.
Our codebase builds upon several excellent open-source projects, including lDeTok, RAE, and torch_fidelity. We are grateful to the communities behind them.
We sincerely thank Jiawei Yang and Boyang Zheng for providing insightful feedback.