Image Feature Extraction
Transformers
Safetensors
English
Chinese
mingtok
visual-tokenizer
feature-extraction
image-reconstruction
autoregressive
Instructions to use inclusionAI/MingTok-Vision with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use inclusionAI/MingTok-Vision with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-feature-extraction", model="inclusionAI/MingTok-Vision")# Load model directly from transformers import MingTok model = MingTok.from_pretrained("inclusionAI/MingTok-Vision", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -8,7 +8,7 @@ library_name: transformers
|
|
| 8 |
---
|
| 9 |
## MingTok: A Unified Tokenizer for Visual Understanding and Generation without Vector Quantization
|
| 10 |
|
| 11 |
-
<p align="center">📑 <a href="https://
|
| 12 |
|
| 13 |
## Key Features
|
| 14 |
- 🖼️ **First Continuous Unified Vision Tokenizer:** MingTok enables unified vision understanding and generation via a continuous latent space, eliminating quantization while preserving semantic and perceptual fidelity.
|
|
@@ -223,4 +223,11 @@ output_image.save(save_path)
|
|
| 223 |
</div>
|
| 224 |
|
| 225 |
## Reference
|
| 226 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
---
|
| 9 |
## MingTok: A Unified Tokenizer for Visual Understanding and Generation without Vector Quantization
|
| 10 |
|
| 11 |
+
<p align="center">📑 <a href="https://arxiv.org/pdf/2510.06590">Technical Report</a> | 📖 <a href="https://inclusionai.github.io/blog/mingtok/">Project Page</a> | 🤗 <a href="https://huggingface.co/inclusionAI/MingTok-Vision">Hugging Face</a> | 🤖 <a href="https://modelscope.cn/models/inclusionAI/MingTok-Vision">ModelScope</a> | 💾 <a href="https://github.com/inclusionAI/Ming-UniVision">GitHub</a></p>
|
| 12 |
|
| 13 |
## Key Features
|
| 14 |
- 🖼️ **First Continuous Unified Vision Tokenizer:** MingTok enables unified vision understanding and generation via a continuous latent space, eliminating quantization while preserving semantic and perceptual fidelity.
|
|
|
|
| 223 |
</div>
|
| 224 |
|
| 225 |
## Reference
|
| 226 |
+
```
|
| 227 |
+
@article{huang2025mingunivision,
|
| 228 |
+
title={Ming-UniVision: Joint Image Understanding and Generation with a Unified Continuous Tokenizer},
|
| 229 |
+
author={Huang, Ziyuan and Zheng, DanDan and Zou, Cheng and Liu, Rui and Wang, Xiaolong and Ji, Kaixiang and Chai, Weilong and Sun, Jianxin and Wang, Libin and Lv, Yongjie and Huang, Taozhi and Liu, Jiajia and Guo, Qingpei and Yang, Ming and Chen, Jingdong and Zhou, Jun},
|
| 230 |
+
journal={arXiv preprint arXiv:2510.06590},
|
| 231 |
+
year={2025}
|
| 232 |
+
}
|
| 233 |
+
```
|