RecTok / README.md

nielsr HF Staff

Add model card for RecTok

f527d7f verified 2 months ago

preview code

raw

history blame

5.25 kB

metadata

license: apache-2.0
pipeline_tag: text-to-image

RecTok: Reconstruction Distillation along Rectified Flow

This repository contains the official PyTorch implementation for the paper RecTok: Reconstruction Distillation along Rectified Flow.

RecTok addresses the fundamental trade-off between latent space dimensionality and generation quality in visual tokenizers for diffusion models. It proposes two key innovations: flow semantic distillation and reconstruction-alignment distillation. This approach enriches the semantic information in forward flow trajectories, which serve as the training space for diffusion transformers, rather than focusing solely on the latent space. As a result, RecTok achieves superior image reconstruction, generation quality, and discriminative performance, setting state-of-the-art results on gFID-50K and demonstrating consistent improvements with increasing latent dimensionality.

Paper: RecTok: Reconstruction Distillation along Rectified Flow
Project Page: https://shi-qingyu.github.io/rectok.github.io/
Code: https://github.com/Shi-qingyu/RecTok

RecTok Pipeline

Usage

For detailed instructions on setting up the environment, downloading models, and performing evaluation or training, please refer to the official GitHub repository.

Installation

Set up the environment and install dependencies:

# Clone the repository
git clone https://github.com/Shi-qingyu/RecTok.git
cd RecTok

# Create and activate conda environment
conda create -n rectok python=3.10 -y
conda activate rectok

# Install requirements
pip install -r requirements.txt

Download Models

Download pretrained models and necessary data assets:

# Download from HuggingFace
huggingface-cli download QingyuShi/RecTok --local-dir ./pretrained_models
# Organize data assets and offline models
mv ./pretrained_models/data ./data
mv ./pretrained_models/offline_models.zip ./offline_models.zip
unzip offline_models.zip && rm offline_models.zip

Generative Model Evaluation

Evaluate the generation quality (FID, etc.). You can find the evaluation results in directory ./work_dirs/gen_model_training/RecTok_eval:

bash run_eval_diffusion.sh \
    pretrained_models/RecTok_decft.pth \        # path to RecTok checkpoint
    pretrained_models/ditdhxl_epoch_0599.pth \  # path to DiTDH-XL checkpoint
    pretrained_models/ditdhs_epoch_0029.pth     # path to autoguidance model checkpoint

Selected examples of class-conditional generation results on ImageNet-1K 256x256:

RecTok Qualitative Results

FID-50k and Inception Score without CFG and with CFG:

cfg	MAR Model	Epochs	FID-50K	Inception Score	#params
1.0	$\text{DiT}^{\text{DH}}\text{-XL}$ + RecTok	80	2.09	198.6	839M
1.29	$\text{DiT}^{\text{DH}}\text{-XL}$ + RecTok	80	1.48	223.8	839M
1.0	$\text{DiT}^{\text{DH}}\text{-XL}$ + RecTok	600	1.34	254.6	839M
1.29	$\text{DiT}^{\text{DH}}\text{-XL}$ + RecTok	600	1.13	289.2	839M

Citation

If you find this work useful for your research, please consider citing:

@article{shi2025rectok,
  title={RecTok: Reconstruction Distillation along Rectified Flow},
  author={Shi, Qingyu and Wu, Size and Bai, Jinbin and Yu, Kaidong and Wang, Yujing and Tong, Yunhai and Li, Xiangtai and Li, Xuelong},
  journal={arXiv preprint arXiv:2512.13421},
  year={2025}
}

Acknowledgements

We thank the authors of lDeTok, RAE, MAE, DiT, and LightningDiT for their foundational work.

Our codebase builds upon several excellent open-source projects, including lDeTok, RAE, and torch_fidelity. We are grateful to the communities behind them.

We sincerely thank Jiawei Yang and Boyang Zheng for providing insightful feedback.