GEAR — GEAR-LFQ

🏠 Homepage | 💻 GitHub | 🤗 Models | 📄 Paper
This repository
BinLin203/GEAR-LFQ ships gear-lfq.pt, a LFQ-16 tokenizer (16384-entry codebook). This is the GEAR tokenizer: the warm-up tokenizer after end-to-end fine-tuning jointly with the autoregressive generator.
Download it with:
huggingface-cli download BinLin203/GEAR-LFQ --local-dir ckpts/GEAR-LFQ
About GEAR
GEAR (Guided End-to-end AutoRegression) trains a vector-quantized
(VQ) tokenizer and an autoregressive (AR) generator jointly, end-to-end, guided by
representation alignment. The VQ index is a non-differentiable argmax (a
straight-through estimator collapses), so GEAR uses a dual read-out of the codebook
assignment: a hard one-hot branch trains the AR, while a differentiable soft
branch carries a REPA loss that flows back to update only the tokenizer. The result
is a tokenizer whose tokens are far easier for an AR to predict.
GEAR's contribution is the tokenizer, which is what these repos release. The AR is a standard LlamaGen backbone; train your own on a frozen GEAR tokenizer with the code.
- 🏠 Project page: https://linb203.github.io/gear/
- 💻 Code & full docs: https://github.com/Tencent-Hunyuan/GEAR
- 🤗 All tokenizers: https://huggingface.co/collections/BinLin203
- 📄 Paper: https://arxiv.org/abs/2606.32039
Released tokenizers
| Quantizer | Warm-up (baseline) | GEAR (end-to-end) |
|---|---|---|
| VQ-16 | Warmup-VQ · vq-with-gan.pt |
GEAR-VQ · gear-vq.pt |
| LFQ-16 | Warmup-LFQ · lfq-with-gan.pt |
GEAR-LFQ · gear-lfq.pt |
| IBQ-16 | Warmup-IBQ · ibq-with-gan.pt |
GEAR-IBQ · gear-ibq.pt |
Reconstruction quality (ImageNet val)
The warm-up and end-to-end (GEAR) tokenizers both keep reconstruction performance on par with the original pretrained weights.
| Quantizer | Setting | rFID↓ | PSNR↑ | SSIM↑ |
|---|---|---|---|---|
| VQ-16 | Original | 2.19 | 20.79 | 0.55 |
| Warm-up | 1.72 | 21.06 | 0.57 | |
| GEAR | 1.64 | 20.78 | 0.56 | |
| LFQ-16 | Original | 2.82 | 21.47 | 0.58 |
| Warm-up | 2.42 | 20.97 | 0.56 | |
| GEAR | 2.13 | 20.48 | 0.55 | |
| IBQ-16 | Original | 2.23 | 21.23 | 0.58 |
| Warm-up | 1.97 | 21.18 | 0.58 | |
| GEAR | 1.72 | 20.92 | 0.57 |
All rows use bicubic resize for an apples-to-apples comparison. The choice of interpolation matters (the official LFQ / IBQ numbers use bilinear, which differs); see the paper appendix for the full bilinear vs. bicubic table.
Quickstart
Clone the code (the tokenizer class lives in models/), then encode → decode
to reconstruct an image:
import torch, torchvision.transforms as T
from PIL import Image
from models import Tokenizers
from src.utils import load_pretrained_tokenizer_state_dict
vq = Tokenizers["LFQ-16"](codebook_size=16384, codebook_embed_dim=14)
vq.load_state_dict(load_pretrained_tokenizer_state_dict("ckpts/GEAR-LFQ/gear-lfq.pt"), strict=False)
vq = vq.eval().cuda()
x = T.ToTensor()(Image.open("input.jpg").convert("RGB").resize((256, 256)))
x = (x * 2 - 1).unsqueeze(0).cuda() # to [-1, 1]
with torch.no_grad():
recon, _ = vq(x) # encode -> quantize -> decode
out = (recon[0].clamp(-1, 1) + 1) / 2
T.ToPILImage()(out.cpu()).save("recon.png")
License
Released by Tencent under the Apache-2.0 License (Copyright © 2026 Tencent; see the LICENSE). "GEAR" refers to the code, parameters, and weights made publicly available under Apache-2.0. GEAR builds on LlamaGen, REPA / REPA-E, Open-MAGVIT2 and IBQ; please also respect those upstream licenses.
Citation
@misc{lin2026gearguidedendtoendautoregression,
title = {GEAR: Guided End-to-End AutoRegression for Image Synthesis},
author = {Bin Lin and Zheyuan Liu and Chenguo Lin and Sixiang Chen and Yunyang Ge and Yunlong Lin and Jianwei Zhang and Miles Yang and Zhao Zhong and Liefeng Bo and Li Yuan},
year = {2026},
eprint = {2606.32039},
archivePrefix = {arXiv},
primaryClass = {cs.CV},
url = {https://arxiv.org/abs/2606.32039}
}
@article{ifsq_llamagenrepa,
title = {iFSQ: Improving FSQ for Image Generation with 1 Line of Code},
author = {Lin, Bin and Li, Zongjian and Niu, Yuwei and Gong, Kaixiong and
Ge, Yunyang and Lin, Yunlong and Zheng, Mingzhe and Zhang, JianWei and
Yang, Miles and Zhong, Zhao and others},
journal = {arXiv preprint arXiv:2601.17124},
year = {2026}
}
Acknowledgements
Built on LlamaGen, REPA / REPA-E, Open-MAGVIT2, IBQ, iFSQ / LlamaGen-REPA; evaluation harness adapted from UniWorld.