RobusTok / README.md

nielsr HF Staff

Add model card

428c3e5 verified 5 months ago

preview code

raw

history blame

2.54 kB

metadata

pipeline_tag: image-feature-extraction

Image Tokenizer Needs Post-Training

This repository contains the official implementation and checkpoints for the paper Image Tokenizer Needs Post-Training.

Project page: https://qiuk2.github.io/works/RobusTok/index.html Code: https://github.com/qiuk2/RobusTok

TL;DR

We present RobusTok, a new image tokenizer with a two-stage training scheme:

Main training → constructs a robust latent space.

Post-training → aligns the generator’s latent distribution with its image space.

Key highlights of Post-Training

🚀 Better generative quality: gFID 1.60 → 1.36.
🔑 Generalizability: applicable to both autoregressive & diffusion models.
⚡ Efficiency: strong results with only ~400M generative models.

Model Zoo

Generator \ Tokenizer	RobusTok w/o. P.T(weights)	RobusTok w/. P.T (weights)
Base (weights)	gFID = 1.83	gFID = 1.60
Large (weights)	gFID = 1.60	gFID = 1.36

Usage

For detailed installation, training, and inference instructions, please refer to the GitHub repository.

Visualization

visualization of 256×256 image generation before (top) and after (bottom) post-training. Three improvements are observed: (a) OOD mitigation, (b) Color fidelity, (c) detail refinement.

Citation

If our work assists your research, feel free to give us a star ⭐ or cite us using

@misc{qiu2025imagetokenizerneedsposttraining,
      title={Image Tokenizer Needs Post-Training}, 
      author={Kai Qiu and Xiang Li and Hao Chen and Jason Kuen and Xiaohao Xu and Jiuxiang Gu and Yinyi Luo and Bhiksha Raj and Zhe Lin and Marios Savvides},
      year={2025},
      eprint={2509.12474},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2509.12474}, 
}