Add model card for RecTok
#1
by
nielsr
HF Staff
- opened
README.md
ADDED
|
@@ -0,0 +1,104 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
pipeline_tag: text-to-image
|
| 4 |
+
---
|
| 5 |
+
|
| 6 |
+
# RecTok: Reconstruction Distillation along Rectified Flow
|
| 7 |
+
|
| 8 |
+
<div align="center">
|
| 9 |
+
|
| 10 |
+
[](https://arxiv.org/abs/2512.13421)
|
| 11 |
+
[](https://shi-qingyu.github.io/rectok.github.io/)
|
| 12 |
+
[](https://github.com/Shi-qingyu/RecTok/blob/main/LICENSE)
|
| 13 |
+
|
| 14 |
+
</div>
|
| 15 |
+
|
| 16 |
+
This repository contains the official PyTorch implementation for the paper [RecTok: Reconstruction Distillation along Rectified Flow](https://huggingface.co/papers/2512.13421).
|
| 17 |
+
|
| 18 |
+
RecTok addresses the fundamental trade-off between latent space dimensionality and generation quality in visual tokenizers for diffusion models. It proposes two key innovations: flow semantic distillation and reconstruction-alignment distillation. This approach enriches the semantic information in forward flow trajectories, which serve as the training space for diffusion transformers, rather than focusing solely on the latent space. As a result, RecTok achieves superior image reconstruction, generation quality, and discriminative performance, setting state-of-the-art results on gFID-50K and demonstrating consistent improvements with increasing latent dimensionality.
|
| 19 |
+
|
| 20 |
+
- **Paper**: [RecTok: Reconstruction Distillation along Rectified Flow](https://huggingface.co/papers/2512.13421)
|
| 21 |
+
- **Project Page**: [https://shi-qingyu.github.io/rectok.github.io/](https://shi-qingyu.github.io/rectok.github.io/)
|
| 22 |
+
- **Code**: [https://github.com/Shi-qingyu/RecTok](https://github.com/Shi-qingyu/RecTok)
|
| 23 |
+
|
| 24 |
+
<p align="center">
|
| 25 |
+
<img src="https://github.com/Shi-qingyu/RecTok/raw/main/assets/pipeline.png" width="720" alt="RecTok Pipeline">
|
| 26 |
+
</p>
|
| 27 |
+
|
| 28 |
+
## Usage
|
| 29 |
+
|
| 30 |
+
For detailed instructions on setting up the environment, downloading models, and performing evaluation or training, please refer to the [official GitHub repository](https://github.com/Shi-qingyu/RecTok).
|
| 31 |
+
|
| 32 |
+
### Installation
|
| 33 |
+
|
| 34 |
+
Set up the environment and install dependencies:
|
| 35 |
+
|
| 36 |
+
```bash
|
| 37 |
+
# Clone the repository
|
| 38 |
+
git clone https://github.com/Shi-qingyu/RecTok.git
|
| 39 |
+
cd RecTok
|
| 40 |
+
|
| 41 |
+
# Create and activate conda environment
|
| 42 |
+
conda create -n rectok python=3.10 -y
|
| 43 |
+
conda activate rectok
|
| 44 |
+
|
| 45 |
+
# Install requirements
|
| 46 |
+
pip install -r requirements.txt
|
| 47 |
+
```
|
| 48 |
+
|
| 49 |
+
### Download Models
|
| 50 |
+
|
| 51 |
+
Download pretrained models and necessary data assets:
|
| 52 |
+
```bash
|
| 53 |
+
# Download from HuggingFace
|
| 54 |
+
huggingface-cli download QingyuShi/RecTok --local-dir ./pretrained_models
|
| 55 |
+
# Organize data assets and offline models
|
| 56 |
+
mv ./pretrained_models/data ./data
|
| 57 |
+
mv ./pretrained_models/offline_models.zip ./offline_models.zip
|
| 58 |
+
unzip offline_models.zip && rm offline_models.zip
|
| 59 |
+
```
|
| 60 |
+
|
| 61 |
+
### Generative Model Evaluation
|
| 62 |
+
|
| 63 |
+
Evaluate the generation quality (FID, etc.). You can find the evaluation results in directory `./work_dirs/gen_model_training/RecTok_eval`:
|
| 64 |
+
|
| 65 |
+
```bash
|
| 66 |
+
bash run_eval_diffusion.sh \
|
| 67 |
+
pretrained_models/RecTok_decft.pth \ # path to RecTok checkpoint
|
| 68 |
+
pretrained_models/ditdhxl_epoch_0599.pth \ # path to DiTDH-XL checkpoint
|
| 69 |
+
pretrained_models/ditdhs_epoch_0029.pth # path to autoguidance model checkpoint
|
| 70 |
+
```
|
| 71 |
+
|
| 72 |
+
Selected examples of class-conditional generation results on ImageNet-1K 256x256:
|
| 73 |
+
<p align="center">
|
| 74 |
+
<img src="https://github.com/Shi-qingyu/RecTok/raw/main/assets/qualitative.png" width="1080" alt="RecTok Qualitative Results">
|
| 75 |
+
</p>
|
| 76 |
+
|
| 77 |
+
FID-50k and Inception Score without CFG and with CFG:
|
| 78 |
+
|cfg| MAR Model | Epochs | FID-50K | Inception Score | #params |
|
| 79 |
+
|---|------------------------------|---------|---------|-----------------|---------|
|
| 80 |
+
|1.0| $\text{DiT}^{\text{DH}}\text{-XL}$ + RecTok | 80 | 2.09 | 198.6 | 839M |
|
| 81 |
+
|1.29| $\text{DiT}^{\text{DH}}\text{-XL}$ + RecTok | 80 | 1.48 | 223.8 | 839M |
|
| 82 |
+
|1.0| $\text{DiT}^{\text{DH}}\text{-XL}$ + RecTok | 600 | 1.34 | 254.6 | 839M |
|
| 83 |
+
|1.29| $\text{DiT}^{\text{DH}}\text{-XL}$ + RecTok | 600 | 1.13 | 289.2 | 839M |
|
| 84 |
+
|
| 85 |
+
## Citation
|
| 86 |
+
|
| 87 |
+
If you find this work useful for your research, please consider citing:
|
| 88 |
+
|
| 89 |
+
```bibtex
|
| 90 |
+
@article{shi2025rectok,
|
| 91 |
+
title={RecTok: Reconstruction Distillation along Rectified Flow},
|
| 92 |
+
author={Shi, Qingyu and Wu, Size and Bai, Jinbin and Yu, Kaidong and Wang, Yujing and Tong, Yunhai and Li, Xiangtai and Li, Xuelong},
|
| 93 |
+
journal={arXiv preprint arXiv:2512.13421},
|
| 94 |
+
year={2025}
|
| 95 |
+
}
|
| 96 |
+
```
|
| 97 |
+
|
| 98 |
+
## Acknowledgements
|
| 99 |
+
|
| 100 |
+
We thank the authors of [lDeTok](https://github.com/Jiawei-Yang/DeTok), [RAE](https://github.com/bytetriper/RAE), [MAE](https://github.com/facebookresearch/mae), [DiT](https://github.com/facebookresearch/DiT), and [LightningDiT](https://github.com/hustvl/LightningDiT) for their foundational work.
|
| 101 |
+
|
| 102 |
+
Our codebase builds upon several excellent open-source projects, including [lDeTok](https://github.com/Jiawei-Yang/DeTok), [RAE](https://github.com/bytetriper/RAE), and [torch_fidelity](https://github.com/toshas/torch-fidelity). We are grateful to the communities behind them.
|
| 103 |
+
|
| 104 |
+
We sincerely thank [Jiawei Yang](https://jiawei-yang.github.io/) and [Boyang Zheng](https://bytetriper.github.io/) for providing insightful feedback.
|