Improve model card and add metadata
#1
by nielsr HF Staff - opened
README.md
CHANGED
|
@@ -1,16 +1,18 @@
|
|
| 1 |
---
|
| 2 |
license: mit
|
|
|
|
| 3 |
tags:
|
| 4 |
- discrete tokenization
|
| 5 |
- autoregressive generation
|
| 6 |
---
|
|
|
|
| 7 |
# InsightTok
|
| 8 |
|
| 9 |
InsightTok is a discrete visual tokenizer designed to improve the fidelity of **text** and **faces**, two of the most challenging yet perceptually important structures in autoregressive image generation.
|
| 10 |
|
| 11 |
-
It was introduced in the paper
|
| 12 |
|
| 13 |
-
- **Paper:**
|
| 14 |
- **Code:** [https://github.com/LeapLabTHU/InsightTok](https://github.com/LeapLabTHU/InsightTok)
|
| 15 |
|
| 16 |
## Model Details
|
|
@@ -24,8 +26,7 @@ It was introduced in the paper *InsightTok: Improving Text and Face Fidelity in
|
|
| 24 |
|
| 25 |
## Performance
|
| 26 |
|
| 27 |
-
InsightTok achieves strong text and face reconstruction quality while maintaining a compact discrete representation.
|
| 28 |
-
|
| 29 |
|
| 30 |
<p align="center">
|
| 31 |
<img src="assets/Recon_Plot.png" width="100%">
|
|
@@ -37,15 +38,22 @@ InsightTok achieves strong text and face reconstruction quality while maintainin
|
|
| 37 |
|
| 38 |
## Usage
|
| 39 |
|
| 40 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 41 |
|
| 42 |
## Citation
|
| 43 |
|
| 44 |
```bibtex
|
| 45 |
@article{yue2026insighttok,
|
| 46 |
title={InsightTok: Improving Text and Face Fidelity in Discrete Tokenization for Autoregressive Image Generation},
|
| 47 |
-
author={Yue, Yang and Wei, Fangyun and He, Tianyu and Zhao, Jinjing and Ni, Zanlin and Liu, Zeyu and Guo, Jiayi and Shi, Lei and Dong, Yue and Chen, Li and Li, Ji and Huang, Gao and Chen, Dong},
|
| 48 |
-
journal={arXiv preprint arXiv:
|
| 49 |
year={2026}
|
| 50 |
}
|
| 51 |
```
|
|
|
|
| 1 |
---
|
| 2 |
license: mit
|
| 3 |
+
pipeline_tag: image-to-image
|
| 4 |
tags:
|
| 5 |
- discrete tokenization
|
| 6 |
- autoregressive generation
|
| 7 |
---
|
| 8 |
+
|
| 9 |
# InsightTok
|
| 10 |
|
| 11 |
InsightTok is a discrete visual tokenizer designed to improve the fidelity of **text** and **faces**, two of the most challenging yet perceptually important structures in autoregressive image generation.
|
| 12 |
|
| 13 |
+
It was introduced in the paper [InsightTok: Improving Text and Face Fidelity in Discrete Tokenization for Autoregressive Image Generation](https://huggingface.co/papers/2605.14333).
|
| 14 |
|
| 15 |
+
- **Paper:** [InsightTok: Improving Text and Face Fidelity in Discrete Tokenization for Autoregressive Image Generation](https://huggingface.co/papers/2605.14333)
|
| 16 |
- **Code:** [https://github.com/LeapLabTHU/InsightTok](https://github.com/LeapLabTHU/InsightTok)
|
| 17 |
|
| 18 |
## Model Details
|
|
|
|
| 26 |
|
| 27 |
## Performance
|
| 28 |
|
| 29 |
+
InsightTok achieves strong text and face reconstruction quality while maintaining a compact discrete representation through localized, content-aware perceptual losses.
|
|
|
|
| 30 |
|
| 31 |
<p align="center">
|
| 32 |
<img src="assets/Recon_Plot.png" width="100%">
|
|
|
|
| 38 |
|
| 39 |
## Usage
|
| 40 |
|
| 41 |
+
InsightTok follows the standard VQGAN-style autoencoding interface. For setup and implementation details, please refer to the [GitHub repository](https://github.com/LeapLabTHU/InsightTok).
|
| 42 |
+
|
| 43 |
+
```python
|
| 44 |
+
# image encoding
|
| 45 |
+
latents, _, [_, _, indices] = vq_model.encode(input_image_tensor)
|
| 46 |
+
# image decoding
|
| 47 |
+
recon_image_tensor = vq_model.decode(latents)
|
| 48 |
+
```
|
| 49 |
|
| 50 |
## Citation
|
| 51 |
|
| 52 |
```bibtex
|
| 53 |
@article{yue2026insighttok,
|
| 54 |
title={InsightTok: Improving Text and Face Fidelity in Discrete Tokenization for Autoregressive Image Generation},
|
| 55 |
+
author={Yue, Yang and Wei, Fangyun and He, Tianyu and Zhao, Jinjing and Ni, Zanlin and Liu, Zeyu and Guo, Jiayi and Shi, Lei and Dong, Yue bit and Chen, Li and Li, Ji and Huang, Gao and Chen, Dong},
|
| 56 |
+
journal={arXiv preprint arXiv:2605.14333},
|
| 57 |
year={2026}
|
| 58 |
}
|
| 59 |
```
|