Upload folder using huggingface_hub
Browse files- .gitattributes +1 -0
- README.md +39 -0
- assets/Compare_Recon.png +3 -0
- insight_tok.pt +3 -0
.gitattributes
CHANGED
|
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
assets/Compare_Recon.png filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
|
@@ -1,3 +1,42 @@
|
|
| 1 |
---
|
| 2 |
license: mit
|
|
|
|
|
|
|
|
|
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: mit
|
| 3 |
+
tags:
|
| 4 |
+
- discrete tokenization
|
| 5 |
+
- autoregressive generation
|
| 6 |
---
|
| 7 |
+
# InsightTok
|
| 8 |
+
|
| 9 |
+
InsightTok is a discrete visual tokenizer designed to improve the fidelity of **text** and **faces**, two of the most challenging yet perceptually important structures in autoregressive image generation.
|
| 10 |
+
|
| 11 |
+
It was introduced in the paper *InsightTok: Improving Text and Face Fidelity in Discrete Tokenization for Autoregressive Image Generation*.
|
| 12 |
+
|
| 13 |
+
- **Paper:**: [InsightTok: Improving Text and Face Fidelity in Discrete Tokenization for Autoregressive Image Generation](https://huggingface.co/papers)
|
| 14 |
+
- **Code:** [https://github.com/LeapLabTHU/JustGRPO](https://github.com/LeapLabTHU/JustGRPO)
|
| 15 |
+
|
| 16 |
+
## Hyperparameters
|
| 17 |
+
|
| 18 |
+
- Downsampling Rate: 16x
|
| 19 |
+
- Codebook Size: 16384
|
| 20 |
+
- Latent Dimension: 256
|
| 21 |
+
- Number of parameters: 426M
|
| 22 |
+
|
| 23 |
+
## Performance
|
| 24 |
+
|
| 25 |
+
<p align="center">
|
| 26 |
+
<img src="assets/Compare_Recon.png" width="95%">
|
| 27 |
+
</p>
|
| 28 |
+
|
| 29 |
+
## Usage
|
| 30 |
+
|
| 31 |
+
Please refer to our [GitHub repository](https://github.com/LeapLabTHU/InsightTok).
|
| 32 |
+
|
| 33 |
+
## Citation
|
| 34 |
+
|
| 35 |
+
```bibtex
|
| 36 |
+
@article{yue2026insighttok,
|
| 37 |
+
title={InsightTok: Improving Text and Face Fidelity in Discrete Tokenization for Autoregressive Image Generation},
|
| 38 |
+
author={Yue, Yang and Wei, Fangyun and He, Tianyu and Zhao, Jinjing and Ni, Zanlin and Liu, Zeyu and Guo, Jiayi and Shi, Lei and Dong, Yue and Chen, Li and Li, Ji and Huang, Gao and Chen, Dong},
|
| 39 |
+
journal={arXiv preprint arXiv:TODO},
|
| 40 |
+
year={2026}
|
| 41 |
+
}
|
| 42 |
+
```
|
assets/Compare_Recon.png
ADDED
|
Git LFS Details
|
insight_tok.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:bde6a0907097298ac541ff2f9da28b926f68483ea138757064e74abb51887483
|
| 3 |
+
size 1721213019
|