# ✨ ViQ Weights ✨
### Text-Aligned Visual Quantized Representations at Any Resolution
Xumin Yu1,*
Zuyan Liu1,2,*
Zhenyu Yang1,2,*
Yuhao Dong3
Shengsheng Qian4
Jiwen Lu2
Han Hu1
Yongming Rao1,†
[](https://huggingface.co/XuminYu/ViQ_weights)
[](https://github.com/yuxumin/ViQ)
---
This repository hosts the **pretrained model weights** for **ViQ**. For the inference / training / weight-conversion **code**, see the main repo: **https://github.com/yuxumin/ViQ**.
ViQ is trained in two stages, and this repository provides weights for **both stages**:
| Folder | Stage | What it is |
| --- | --- | --- |
| [`anyres_vit/`](anyres_vit) | **Stage 1** | Text-aligned, any-resolution **continuous** SigLIP2 ViT encoders |
| [`ViQ/`](ViQ) | **Stage 2** | **Discrete** ViQ tokenizers (multiple FSQ codebook sizes) |
## 📦 `anyres_vit/` — Stage 1 (Any-Resolution ViT)
The text-aligned, any-resolution ViT encoders produced after **Stage 1** pre-training. Two backbone sizes are released:
| Size | Backbone | File |
| --- | --- | --- |
| **400M** | SigLIP2-SO400M | `anyres_vit/so400m/siglip2_so400m_anyres_s4.pth` |
| **1B** | SigLIP2-g | `anyres_vit/giant1b/siglip2_g_anyres_s4.pth` |
## 🔢 `ViQ/` — Stage 2 (Discrete Tokenizers)
The discretized ViQ tokenizers produced after **Stage 2**, released in several FSQ **codebook sizes**. Each `converted_