RecTok / README.md
nielsr's picture
nielsr HF Staff
Add model card for RecTok
f527d7f verified
|
raw
history blame
5.25 kB
metadata
license: apache-2.0
pipeline_tag: text-to-image

RecTok: Reconstruction Distillation along Rectified Flow

arXiv Project Page License

This repository contains the official PyTorch implementation for the paper RecTok: Reconstruction Distillation along Rectified Flow.

RecTok addresses the fundamental trade-off between latent space dimensionality and generation quality in visual tokenizers for diffusion models. It proposes two key innovations: flow semantic distillation and reconstruction-alignment distillation. This approach enriches the semantic information in forward flow trajectories, which serve as the training space for diffusion transformers, rather than focusing solely on the latent space. As a result, RecTok achieves superior image reconstruction, generation quality, and discriminative performance, setting state-of-the-art results on gFID-50K and demonstrating consistent improvements with increasing latent dimensionality.

RecTok Pipeline

Usage

For detailed instructions on setting up the environment, downloading models, and performing evaluation or training, please refer to the official GitHub repository.

Installation

Set up the environment and install dependencies:

# Clone the repository
git clone https://github.com/Shi-qingyu/RecTok.git
cd RecTok

# Create and activate conda environment
conda create -n rectok python=3.10 -y
conda activate rectok

# Install requirements
pip install -r requirements.txt

Download Models

Download pretrained models and necessary data assets:

# Download from HuggingFace
huggingface-cli download QingyuShi/RecTok --local-dir ./pretrained_models
# Organize data assets and offline models
mv ./pretrained_models/data ./data
mv ./pretrained_models/offline_models.zip ./offline_models.zip
unzip offline_models.zip && rm offline_models.zip

Generative Model Evaluation

Evaluate the generation quality (FID, etc.). You can find the evaluation results in directory ./work_dirs/gen_model_training/RecTok_eval:

bash run_eval_diffusion.sh \
    pretrained_models/RecTok_decft.pth \        # path to RecTok checkpoint
    pretrained_models/ditdhxl_epoch_0599.pth \  # path to DiTDH-XL checkpoint
    pretrained_models/ditdhs_epoch_0029.pth     # path to autoguidance model checkpoint

Selected examples of class-conditional generation results on ImageNet-1K 256x256:

RecTok Qualitative Results

FID-50k and Inception Score without CFG and with CFG:

cfg MAR Model Epochs FID-50K Inception Score #params
1.0 $\text{DiT}^{\text{DH}}\text{-XL}$ + RecTok 80 2.09 198.6 839M
1.29 $\text{DiT}^{\text{DH}}\text{-XL}$ + RecTok 80 1.48 223.8 839M
1.0 $\text{DiT}^{\text{DH}}\text{-XL}$ + RecTok 600 1.34 254.6 839M
1.29 $\text{DiT}^{\text{DH}}\text{-XL}$ + RecTok 600 1.13 289.2 839M

Citation

If you find this work useful for your research, please consider citing:

@article{shi2025rectok,
  title={RecTok: Reconstruction Distillation along Rectified Flow},
  author={Shi, Qingyu and Wu, Size and Bai, Jinbin and Yu, Kaidong and Wang, Yujing and Tong, Yunhai and Li, Xiangtai and Li, Xuelong},
  journal={arXiv preprint arXiv:2512.13421},
  year={2025}
}

Acknowledgements

We thank the authors of lDeTok, RAE, MAE, DiT, and LightningDiT for their foundational work.

Our codebase builds upon several excellent open-source projects, including lDeTok, RAE, and torch_fidelity. We are grateful to the communities behind them.

We sincerely thank Jiawei Yang and Boyang Zheng for providing insightful feedback.