RecTok / README.md

Add model card for RecTok

f527d7f verified 2 months ago

5.25 kB

	---
	license: apache-2.0
	pipeline_tag: text-to-image
	---

	# RecTok: Reconstruction Distillation along Rectified Flow

	<div align="center">

	[![arXiv](https://img.shields.io/badge/arXiv-RecTok-b31b1b.svg?style=flat-square)](https://arxiv.org/abs/2512.13421)
	[![Project Page](https://img.shields.io/badge/Project-Page-blue?style=flat-square)](https://shi-qingyu.github.io/rectok.github.io/)
	[![License](https://img.shields.io/badge/License-Apache_2.0-green.svg?style=flat-square)](https://github.com/Shi-qingyu/RecTok/blob/main/LICENSE)

	</div>

	This repository contains the official PyTorch implementation for the paper [RecTok: Reconstruction Distillation along Rectified Flow](https://huggingface.co/papers/2512.13421).

	RecTok addresses the fundamental trade-off between latent space dimensionality and generation quality in visual tokenizers for diffusion models. It proposes two key innovations: flow semantic distillation and reconstruction-alignment distillation. This approach enriches the semantic information in forward flow trajectories, which serve as the training space for diffusion transformers, rather than focusing solely on the latent space. As a result, RecTok achieves superior image reconstruction, generation quality, and discriminative performance, setting state-of-the-art results on gFID-50K and demonstrating consistent improvements with increasing latent dimensionality.

	- Paper: [RecTok: Reconstruction Distillation along Rectified Flow](https://huggingface.co/papers/2512.13421)
	- Project Page: [https://shi-qingyu.github.io/rectok.github.io/](https://shi-qingyu.github.io/rectok.github.io/)
	- Code: [https://github.com/Shi-qingyu/RecTok](https://github.com/Shi-qingyu/RecTok)

	<p align="center">
	<img src="https://github.com/Shi-qingyu/RecTok/raw/main/assets/pipeline.png" width="720" alt="RecTok Pipeline">
	</p>

	## Usage

	For detailed instructions on setting up the environment, downloading models, and performing evaluation or training, please refer to the [official GitHub repository](https://github.com/Shi-qingyu/RecTok).

	### Installation

	Set up the environment and install dependencies:

	```bash
	# Clone the repository
	git clone https://github.com/Shi-qingyu/RecTok.git
	cd RecTok

	# Create and activate conda environment
	conda create -n rectok python=3.10 -y
	conda activate rectok

	# Install requirements
	pip install -r requirements.txt
	```

	### Download Models

	Download pretrained models and necessary data assets:
	```bash
	# Download from HuggingFace
	huggingface-cli download QingyuShi/RecTok --local-dir ./pretrained_models
	# Organize data assets and offline models
	mv ./pretrained_models/data ./data
	mv ./pretrained_models/offline_models.zip ./offline_models.zip
	unzip offline_models.zip && rm offline_models.zip
	```

	### Generative Model Evaluation

	Evaluate the generation quality (FID, etc.). You can find the evaluation results in directory `./work_dirs/gen_model_training/RecTok_eval`:

	```bash
	bash run_eval_diffusion.sh \
	pretrained_models/RecTok_decft.pth \ # path to RecTok checkpoint
	pretrained_models/ditdhxl_epoch_0599.pth \ # path to DiTDH-XL checkpoint
	pretrained_models/ditdhs_epoch_0029.pth # path to autoguidance model checkpoint
	```

	Selected examples of class-conditional generation results on ImageNet-1K 256x256:
	<p align="center">
	<img src="https://github.com/Shi-qingyu/RecTok/raw/main/assets/qualitative.png" width="1080" alt="RecTok Qualitative Results">
	</p>

	FID-50k and Inception Score without CFG and with CFG:
	\|cfg\| MAR Model \| Epochs \| FID-50K \| Inception Score \| #params \|
	\|---\|------------------------------\|---------\|---------\|-----------------\|---------\|
	\|1.0\| $\text{DiT}^{\text{DH}}\text{-XL}$ + RecTok \| 80 \| 2.09 \| 198.6 \| 839M \|
	\|1.29\| $\text{DiT}^{\text{DH}}\text{-XL}$ + RecTok \| 80 \| 1.48 \| 223.8 \| 839M \|
	\|1.0\| $\text{DiT}^{\text{DH}}\text{-XL}$ + RecTok \| 600 \| 1.34 \| 254.6 \| 839M \|
	\|1.29\| $\text{DiT}^{\text{DH}}\text{-XL}$ + RecTok \| 600 \| 1.13 \| 289.2 \| 839M \|

	## Citation

	If you find this work useful for your research, please consider citing:

	```bibtex
	@article{shi2025rectok,
	title={RecTok: Reconstruction Distillation along Rectified Flow},
	author={Shi, Qingyu and Wu, Size and Bai, Jinbin and Yu, Kaidong and Wang, Yujing and Tong, Yunhai and Li, Xiangtai and Li, Xuelong},
	journal={arXiv preprint arXiv:2512.13421},
	year={2025}
	}
	```

	## Acknowledgements

	We thank the authors of [lDeTok](https://github.com/Jiawei-Yang/DeTok), [RAE](https://github.com/bytetriper/RAE), [MAE](https://github.com/facebookresearch/mae), [DiT](https://github.com/facebookresearch/DiT), and [LightningDiT](https://github.com/hustvl/LightningDiT) for their foundational work.

	Our codebase builds upon several excellent open-source projects, including [lDeTok](https://github.com/Jiawei-Yang/DeTok), [RAE](https://github.com/bytetriper/RAE), and [torch_fidelity](https://github.com/toshas/torch-fidelity). We are grateful to the communities behind them.

	We sincerely thank [Jiawei Yang](https://jiawei-yang.github.io/) and [Boyang Zheng](https://bytetriper.github.io/) for providing insightful feedback.