Upload folder using huggingface_hub
Browse files
README.md
CHANGED
|
@@ -11,9 +11,14 @@
|
|
| 11 |
**ZwZ-4B** is a fine-grained multimodal perception model built upon [Qwen3-VL-4B](https://huggingface.co/Qwen/Qwen3-VL-4B). It is trained using **Region-to-Image Distillation (R2I)** combined with reinforcement learning, enabling superior fine-grained visual understanding in a single forward pass — no inference-time zooming or tool calling required. ZwZ-4B achieves state-of-the-art performance on fine-grained perception benchmarks among open-source models of comparable size.
|
| 12 |
|
| 13 |
<div align=center>
|
| 14 |
-
<img src="gp_avg_comparison.png" width="90%" alt="avg_comparison"/>
|
| 15 |
</div>
|
| 16 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 17 |
|
| 18 |
## How It Works
|
| 19 |
|
|
@@ -96,4 +101,4 @@ ZwZ-4B is trained on [inclusionAI/ZwZ-RL-VQA](https://huggingface.co/datasets/in
|
|
| 96 |
|
| 97 |
## License
|
| 98 |
|
| 99 |
-
This model follows the license of
|
|
|
|
| 11 |
**ZwZ-4B** is a fine-grained multimodal perception model built upon [Qwen3-VL-4B](https://huggingface.co/Qwen/Qwen3-VL-4B). It is trained using **Region-to-Image Distillation (R2I)** combined with reinforcement learning, enabling superior fine-grained visual understanding in a single forward pass — no inference-time zooming or tool calling required. ZwZ-4B achieves state-of-the-art performance on fine-grained perception benchmarks among open-source models of comparable size.
|
| 12 |
|
| 13 |
<div align=center>
|
| 14 |
+
<img src="assets/gp_avg_comparison.png" width="90%" alt="avg_comparison"/>
|
| 15 |
</div>
|
| 16 |
|
| 17 |
+
## Key Features
|
| 18 |
+
|
| 19 |
+
- **⚡ Single-Pass Efficiency**: Achieves fine-grained perception in one forward pass, eliminating inference-time tool-calling overhead
|
| 20 |
+
- **🎯 Superior Accuracy**: State-of-the-art on perception benchmarks among open-source models
|
| 21 |
+
- **📈 Broad Improvements**: Enhances not only perception benchmarks but also out-of-distribution generalization on visual reasoning, GUI agent, and AIGC detection
|
| 22 |
|
| 23 |
## How It Works
|
| 24 |
|
|
|
|
| 101 |
|
| 102 |
## License
|
| 103 |
|
| 104 |
+
This model follows the license of [Qwen3-VL-4B](https://huggingface.co/Qwen/Qwen3-VL-4B). Please refer to the base model's license for usage terms.
|