RobusTok / README.md
nielsr's picture
nielsr HF Staff
Add model card
428c3e5 verified
|
raw
history blame
2.54 kB
metadata
pipeline_tag: image-feature-extraction

Image Tokenizer Needs Post-Training

This repository contains the official implementation and checkpoints for the paper Image Tokenizer Needs Post-Training.

Project page: https://qiuk2.github.io/works/RobusTok/index.html Code: https://github.com/qiuk2/RobusTok

Teaser

TL;DR

We present RobusTok, a new image tokenizer with a two-stage training scheme:

Main training → constructs a robust latent space.

Post-training → aligns the generator’s latent distribution with its image space.

Key highlights of Post-Training

  • 🚀 Better generative quality: gFID 1.60 → 1.36.
  • 🔑 Generalizability: applicable to both autoregressive & diffusion models.
  • Efficiency: strong results with only ~400M generative models.

Model Zoo

Generator \ Tokenizer RobusTok w/o. P.T(weights) RobusTok w/. P.T (weights)
Base (weights) gFID = 1.83 gFID = 1.60
Large (weights) gFID = 1.60 gFID = 1.36

Usage

For detailed installation, training, and inference instructions, please refer to the GitHub repository.


Visualization

vis

visualization of 256×256 image generation before (top) and after (bottom) post-training. Three improvements are observed: (a) OOD mitigation, (b) Color fidelity, (c) detail refinement.


Citation

If our work assists your research, feel free to give us a star ⭐ or cite us using

@misc{qiu2025imagetokenizerneedsposttraining,
      title={Image Tokenizer Needs Post-Training}, 
      author={Kai Qiu and Xiang Li and Hao Chen and Jason Kuen and Xiaohao Xu and Jiuxiang Gu and Yinyi Luo and Bhiksha Raj and Zhe Lin and Marios Savvides},
      year={2025},
      eprint={2509.12474},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2509.12474}, 
}