Add model card for RecTok

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +104 -0
README.md ADDED
@@ -0,0 +1,104 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ pipeline_tag: text-to-image
4
+ ---
5
+
6
+ # RecTok: Reconstruction Distillation along Rectified Flow
7
+
8
+ <div align="center">
9
+
10
+ [![arXiv](https://img.shields.io/badge/arXiv-RecTok-b31b1b.svg?style=flat-square)](https://arxiv.org/abs/2512.13421)
11
+ [![Project Page](https://img.shields.io/badge/Project-Page-blue?style=flat-square)](https://shi-qingyu.github.io/rectok.github.io/)
12
+ [![License](https://img.shields.io/badge/License-Apache_2.0-green.svg?style=flat-square)](https://github.com/Shi-qingyu/RecTok/blob/main/LICENSE)
13
+
14
+ </div>
15
+
16
+ This repository contains the official PyTorch implementation for the paper [RecTok: Reconstruction Distillation along Rectified Flow](https://huggingface.co/papers/2512.13421).
17
+
18
+ RecTok addresses the fundamental trade-off between latent space dimensionality and generation quality in visual tokenizers for diffusion models. It proposes two key innovations: flow semantic distillation and reconstruction-alignment distillation. This approach enriches the semantic information in forward flow trajectories, which serve as the training space for diffusion transformers, rather than focusing solely on the latent space. As a result, RecTok achieves superior image reconstruction, generation quality, and discriminative performance, setting state-of-the-art results on gFID-50K and demonstrating consistent improvements with increasing latent dimensionality.
19
+
20
+ - **Paper**: [RecTok: Reconstruction Distillation along Rectified Flow](https://huggingface.co/papers/2512.13421)
21
+ - **Project Page**: [https://shi-qingyu.github.io/rectok.github.io/](https://shi-qingyu.github.io/rectok.github.io/)
22
+ - **Code**: [https://github.com/Shi-qingyu/RecTok](https://github.com/Shi-qingyu/RecTok)
23
+
24
+ <p align="center">
25
+ <img src="https://github.com/Shi-qingyu/RecTok/raw/main/assets/pipeline.png" width="720" alt="RecTok Pipeline">
26
+ </p>
27
+
28
+ ## Usage
29
+
30
+ For detailed instructions on setting up the environment, downloading models, and performing evaluation or training, please refer to the [official GitHub repository](https://github.com/Shi-qingyu/RecTok).
31
+
32
+ ### Installation
33
+
34
+ Set up the environment and install dependencies:
35
+
36
+ ```bash
37
+ # Clone the repository
38
+ git clone https://github.com/Shi-qingyu/RecTok.git
39
+ cd RecTok
40
+
41
+ # Create and activate conda environment
42
+ conda create -n rectok python=3.10 -y
43
+ conda activate rectok
44
+
45
+ # Install requirements
46
+ pip install -r requirements.txt
47
+ ```
48
+
49
+ ### Download Models
50
+
51
+ Download pretrained models and necessary data assets:
52
+ ```bash
53
+ # Download from HuggingFace
54
+ huggingface-cli download QingyuShi/RecTok --local-dir ./pretrained_models
55
+ # Organize data assets and offline models
56
+ mv ./pretrained_models/data ./data
57
+ mv ./pretrained_models/offline_models.zip ./offline_models.zip
58
+ unzip offline_models.zip && rm offline_models.zip
59
+ ```
60
+
61
+ ### Generative Model Evaluation
62
+
63
+ Evaluate the generation quality (FID, etc.). You can find the evaluation results in directory `./work_dirs/gen_model_training/RecTok_eval`:
64
+
65
+ ```bash
66
+ bash run_eval_diffusion.sh \
67
+ pretrained_models/RecTok_decft.pth \ # path to RecTok checkpoint
68
+ pretrained_models/ditdhxl_epoch_0599.pth \ # path to DiTDH-XL checkpoint
69
+ pretrained_models/ditdhs_epoch_0029.pth # path to autoguidance model checkpoint
70
+ ```
71
+
72
+ Selected examples of class-conditional generation results on ImageNet-1K 256x256:
73
+ <p align="center">
74
+ <img src="https://github.com/Shi-qingyu/RecTok/raw/main/assets/qualitative.png" width="1080" alt="RecTok Qualitative Results">
75
+ </p>
76
+
77
+ FID-50k and Inception Score without CFG and with CFG:
78
+ |cfg| MAR Model | Epochs | FID-50K | Inception Score | #params |
79
+ |---|------------------------------|---------|---------|-----------------|---------|
80
+ |1.0| $\text{DiT}^{\text{DH}}\text{-XL}$ + RecTok | 80 | 2.09 | 198.6 | 839M |
81
+ |1.29| $\text{DiT}^{\text{DH}}\text{-XL}$ + RecTok | 80 | 1.48 | 223.8 | 839M |
82
+ |1.0| $\text{DiT}^{\text{DH}}\text{-XL}$ + RecTok | 600 | 1.34 | 254.6 | 839M |
83
+ |1.29| $\text{DiT}^{\text{DH}}\text{-XL}$ + RecTok | 600 | 1.13 | 289.2 | 839M |
84
+
85
+ ## Citation
86
+
87
+ If you find this work useful for your research, please consider citing:
88
+
89
+ ```bibtex
90
+ @article{shi2025rectok,
91
+ title={RecTok: Reconstruction Distillation along Rectified Flow},
92
+ author={Shi, Qingyu and Wu, Size and Bai, Jinbin and Yu, Kaidong and Wang, Yujing and Tong, Yunhai and Li, Xiangtai and Li, Xuelong},
93
+ journal={arXiv preprint arXiv:2512.13421},
94
+ year={2025}
95
+ }
96
+ ```
97
+
98
+ ## Acknowledgements
99
+
100
+ We thank the authors of [lDeTok](https://github.com/Jiawei-Yang/DeTok), [RAE](https://github.com/bytetriper/RAE), [MAE](https://github.com/facebookresearch/mae), [DiT](https://github.com/facebookresearch/DiT), and [LightningDiT](https://github.com/hustvl/LightningDiT) for their foundational work.
101
+
102
+ Our codebase builds upon several excellent open-source projects, including [lDeTok](https://github.com/Jiawei-Yang/DeTok), [RAE](https://github.com/bytetriper/RAE), and [torch_fidelity](https://github.com/toshas/torch-fidelity). We are grateful to the communities behind them.
103
+
104
+ We sincerely thank [Jiawei Yang](https://jiawei-yang.github.io/) and [Boyang Zheng](https://bytetriper.github.io/) for providing insightful feedback.