GEAR
tokenizer
VQGAN
image-generation
autoregressive

GEAR — GEAR-IBQ

GEAR

🏠 Homepage  |  💻 GitHub  |  🤗 Models  |  📄 Paper

This repository

BinLin203/GEAR-IBQ ships gear-ibq.pt, a IBQ-16 tokenizer (16384-entry codebook). This is the GEAR tokenizer: the warm-up tokenizer after end-to-end fine-tuning jointly with the autoregressive generator.

Download it with:

huggingface-cli download BinLin203/GEAR-IBQ --local-dir ckpts/GEAR-IBQ

About GEAR

GEAR (Guided End-to-end AutoRegression) trains a vector-quantized (VQ) tokenizer and an autoregressive (AR) generator jointly, end-to-end, guided by representation alignment. The VQ index is a non-differentiable argmax (a straight-through estimator collapses), so GEAR uses a dual read-out of the codebook assignment: a hard one-hot branch trains the AR, while a differentiable soft branch carries a REPA loss that flows back to update only the tokenizer. The result is a tokenizer whose tokens are far easier for an AR to predict.

GEAR's contribution is the tokenizer, which is what these repos release. The AR is a standard LlamaGen backbone; train your own on a frozen GEAR tokenizer with the code.

Released tokenizers

Quantizer Warm-up (baseline) GEAR (end-to-end)
VQ-16 Warmup-VQ · vq-with-gan.pt GEAR-VQ · gear-vq.pt
LFQ-16 Warmup-LFQ · lfq-with-gan.pt GEAR-LFQ · gear-lfq.pt
IBQ-16 Warmup-IBQ · ibq-with-gan.pt GEAR-IBQ · gear-ibq.pt

Reconstruction quality (ImageNet val)

The warm-up and end-to-end (GEAR) tokenizers both keep reconstruction performance on par with the original pretrained weights.

Quantizer Setting rFID↓ PSNR↑ SSIM↑
VQ-16 Original 2.19 20.79 0.55
Warm-up 1.72 21.06 0.57
GEAR 1.64 20.78 0.56
LFQ-16 Original 2.82 21.47 0.58
Warm-up 2.42 20.97 0.56
GEAR 2.13 20.48 0.55
IBQ-16 Original 2.23 21.23 0.58
Warm-up 1.97 21.18 0.58
GEAR 1.72 20.92 0.57

All rows use bicubic resize for an apples-to-apples comparison. The choice of interpolation matters (the official LFQ / IBQ numbers use bilinear, which differs); see the paper appendix for the full bilinear vs. bicubic table.

Quickstart

Clone the code (the tokenizer class lives in models/), then encode → decode to reconstruct an image:

import torch, torchvision.transforms as T
from PIL import Image
from models import Tokenizers
from src.utils import load_pretrained_tokenizer_state_dict

vq = Tokenizers["IBQ-16"](codebook_size=16384, codebook_embed_dim=256)
vq.load_state_dict(load_pretrained_tokenizer_state_dict("ckpts/GEAR-IBQ/gear-ibq.pt"), strict=False)
vq = vq.eval().cuda()

x = T.ToTensor()(Image.open("input.jpg").convert("RGB").resize((256, 256)))
x = (x * 2 - 1).unsqueeze(0).cuda()             # to [-1, 1]
with torch.no_grad():
    recon, _ = vq(x)                             # encode -> quantize -> decode
out = (recon[0].clamp(-1, 1) + 1) / 2
T.ToPILImage()(out.cpu()).save("recon.png")

License

Released by Tencent under the Apache-2.0 License (Copyright © 2026 Tencent; see the LICENSE). "GEAR" refers to the code, parameters, and weights made publicly available under Apache-2.0. GEAR builds on LlamaGen, REPA / REPA-E, Open-MAGVIT2 and IBQ; please also respect those upstream licenses.

Citation

@misc{lin2026gearguidedendtoendautoregression,
  title         = {GEAR: Guided End-to-End AutoRegression for Image Synthesis},
  author        = {Bin Lin and Zheyuan Liu and Chenguo Lin and Sixiang Chen and Yunyang Ge and Yunlong Lin and Jianwei Zhang and Miles Yang and Zhao Zhong and Liefeng Bo and Li Yuan},
  year          = {2026},
  eprint        = {2606.32039},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CV},
  url           = {https://arxiv.org/abs/2606.32039}
}

@article{ifsq_llamagenrepa,
  title   = {iFSQ: Improving FSQ for Image Generation with 1 Line of Code},
  author  = {Lin, Bin and Li, Zongjian and Niu, Yuwei and Gong, Kaixiong and
             Ge, Yunyang and Lin, Yunlong and Zheng, Mingzhe and Zhang, JianWei and
             Yang, Miles and Zhong, Zhao and others},
  journal = {arXiv preprint arXiv:2601.17124},
  year    = {2026}
}

Acknowledgements

Built on LlamaGen, REPA / REPA-E, Open-MAGVIT2, IBQ, iFSQ / LlamaGen-REPA; evaluation harness adapted from UniWorld.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train BinLin203/GEAR-IBQ

Collection including BinLin203/GEAR-IBQ

Papers for BinLin203/GEAR-IBQ