|
|
--- |
|
|
license: apache-2.0 |
|
|
pipeline_tag: text-to-image |
|
|
--- |
|
|
|
|
|
# RecTok: Reconstruction Distillation along Rectified Flow |
|
|
|
|
|
<div align="center"> |
|
|
|
|
|
[](https://arxiv.org/abs/2512.13421) |
|
|
[](https://shi-qingyu.github.io/rectok.github.io/) |
|
|
[](https://github.com/Shi-qingyu/RecTok/blob/main/LICENSE) |
|
|
|
|
|
</div> |
|
|
|
|
|
This repository contains the official PyTorch implementation for the paper [RecTok: Reconstruction Distillation along Rectified Flow](https://huggingface.co/papers/2512.13421). |
|
|
|
|
|
RecTok addresses the fundamental trade-off between latent space dimensionality and generation quality in visual tokenizers for diffusion models. It proposes two key innovations: flow semantic distillation and reconstruction-alignment distillation. This approach enriches the semantic information in forward flow trajectories, which serve as the training space for diffusion transformers, rather than focusing solely on the latent space. As a result, RecTok achieves superior image reconstruction, generation quality, and discriminative performance, setting state-of-the-art results on gFID-50K and demonstrating consistent improvements with increasing latent dimensionality. |
|
|
|
|
|
- **Paper**: [RecTok: Reconstruction Distillation along Rectified Flow](https://huggingface.co/papers/2512.13421) |
|
|
- **Project Page**: [https://shi-qingyu.github.io/rectok.github.io/](https://shi-qingyu.github.io/rectok.github.io/) |
|
|
- **Code**: [https://github.com/Shi-qingyu/RecTok](https://github.com/Shi-qingyu/RecTok) |
|
|
|
|
|
<p align="center"> |
|
|
<img src="https://github.com/Shi-qingyu/RecTok/raw/main/assets/pipeline.png" width="720" alt="RecTok Pipeline"> |
|
|
</p> |
|
|
|
|
|
## Usage |
|
|
|
|
|
For detailed instructions on setting up the environment, downloading models, and performing evaluation or training, please refer to the [official GitHub repository](https://github.com/Shi-qingyu/RecTok). |
|
|
|
|
|
### Installation |
|
|
|
|
|
Set up the environment and install dependencies: |
|
|
|
|
|
```bash |
|
|
# Clone the repository |
|
|
git clone https://github.com/Shi-qingyu/RecTok.git |
|
|
cd RecTok |
|
|
|
|
|
# Create and activate conda environment |
|
|
conda create -n rectok python=3.10 -y |
|
|
conda activate rectok |
|
|
|
|
|
# Install requirements |
|
|
pip install -r requirements.txt |
|
|
``` |
|
|
|
|
|
### Download Models |
|
|
|
|
|
Download pretrained models and necessary data assets: |
|
|
```bash |
|
|
# Download from HuggingFace |
|
|
huggingface-cli download QingyuShi/RecTok --local-dir ./pretrained_models |
|
|
# Organize data assets and offline models |
|
|
mv ./pretrained_models/data ./data |
|
|
mv ./pretrained_models/offline_models.zip ./offline_models.zip |
|
|
unzip offline_models.zip && rm offline_models.zip |
|
|
``` |
|
|
|
|
|
### Generative Model Evaluation |
|
|
|
|
|
Evaluate the generation quality (FID, etc.). You can find the evaluation results in directory `./work_dirs/gen_model_training/RecTok_eval`: |
|
|
|
|
|
```bash |
|
|
bash run_eval_diffusion.sh \ |
|
|
pretrained_models/RecTok_decft.pth \ # path to RecTok checkpoint |
|
|
pretrained_models/ditdhxl_epoch_0599.pth \ # path to DiTDH-XL checkpoint |
|
|
pretrained_models/ditdhs_epoch_0029.pth # path to autoguidance model checkpoint |
|
|
``` |
|
|
|
|
|
Selected examples of class-conditional generation results on ImageNet-1K 256x256: |
|
|
<p align="center"> |
|
|
<img src="https://github.com/Shi-qingyu/RecTok/raw/main/assets/qualitative.png" width="1080" alt="RecTok Qualitative Results"> |
|
|
</p> |
|
|
|
|
|
FID-50k and Inception Score without CFG and with CFG: |
|
|
|cfg| MAR Model | Epochs | FID-50K | Inception Score | #params | |
|
|
|---|------------------------------|---------|---------|-----------------|---------| |
|
|
|1.0| $\text{DiT}^{\text{DH}}\text{-XL}$ + RecTok | 80 | 2.09 | 198.6 | 839M | |
|
|
|1.29| $\text{DiT}^{\text{DH}}\text{-XL}$ + RecTok | 80 | 1.48 | 223.8 | 839M | |
|
|
|1.0| $\text{DiT}^{\text{DH}}\text{-XL}$ + RecTok | 600 | 1.34 | 254.6 | 839M | |
|
|
|1.29| $\text{DiT}^{\text{DH}}\text{-XL}$ + RecTok | 600 | 1.13 | 289.2 | 839M | |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you find this work useful for your research, please consider citing: |
|
|
|
|
|
```bibtex |
|
|
@article{shi2025rectok, |
|
|
title={RecTok: Reconstruction Distillation along Rectified Flow}, |
|
|
author={Shi, Qingyu and Wu, Size and Bai, Jinbin and Yu, Kaidong and Wang, Yujing and Tong, Yunhai and Li, Xiangtai and Li, Xuelong}, |
|
|
journal={arXiv preprint arXiv:2512.13421}, |
|
|
year={2025} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Acknowledgements |
|
|
|
|
|
We thank the authors of [lDeTok](https://github.com/Jiawei-Yang/DeTok), [RAE](https://github.com/bytetriper/RAE), [MAE](https://github.com/facebookresearch/mae), [DiT](https://github.com/facebookresearch/DiT), and [LightningDiT](https://github.com/hustvl/LightningDiT) for their foundational work. |
|
|
|
|
|
Our codebase builds upon several excellent open-source projects, including [lDeTok](https://github.com/Jiawei-Yang/DeTok), [RAE](https://github.com/bytetriper/RAE), and [torch_fidelity](https://github.com/toshas/torch-fidelity). We are grateful to the communities behind them. |
|
|
|
|
|
We sincerely thank [Jiawei Yang](https://jiawei-yang.github.io/) and [Boyang Zheng](https://bytetriper.github.io/) for providing insightful feedback. |