File size: 1,659 Bytes
c9ce196
 
b5762d5
 
 
c9ce196
b5762d5
 
 
 
 
 
 
a08409a
b5762d5
a08409a
b5762d5
a08409a
 
 
 
 
 
b5762d5
 
 
a08409a
 
 
 
 
 
 
b5762d5
a08409a
b5762d5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
---
license: mit
tags:
- discrete tokenization
- autoregressive generation
---
# InsightTok

InsightTok is a discrete visual tokenizer designed to improve the fidelity of **text** and **faces**, two of the most challenging yet perceptually important structures in autoregressive image generation.

It was introduced in the paper *InsightTok: Improving Text and Face Fidelity in Discrete Tokenization for Autoregressive Image Generation*.

- **Paper:**: [InsightTok: Improving Text and Face Fidelity in Discrete Tokenization for Autoregressive Image Generation](https://huggingface.co/papers)
- **Code:** [https://github.com/LeapLabTHU/InsightTok](https://github.com/LeapLabTHU/InsightTok)

## Model Details

| Property | Value |
|---|---:|
| Downsampling rate | 16× |
| Codebook size | 16,384 |
| Latent dimension | 256 |
| Number of parameters | 426M |

## Performance

InsightTok achieves strong text and face reconstruction quality while maintaining a compact discrete representation.


<p align="center">
  <img src="assets/Recon_Plot.png" width="100%">
</p>

<p align="center">
  <img src="assets/Compare_Recon.png" width="100%">
</p>

## Usage

Please refer to our [GitHub repository](https://github.com/LeapLabTHU/InsightTok).

## Citation

```bibtex
@article{yue2026insighttok,
  title={InsightTok: Improving Text and Face Fidelity in Discrete Tokenization for Autoregressive Image Generation},
  author={Yue, Yang and Wei, Fangyun and He, Tianyu and Zhao, Jinjing and Ni, Zanlin and Liu, Zeyu and Guo, Jiayi and Shi, Lei and Dong, Yue and Chen, Li and Li, Ji and Huang, Gao and Chen, Dong},
  journal={arXiv preprint arXiv:TODO},
  year={2026}
}
```