inclusionAI
/

ZwZ-4B

Image-Text-to-Text

Model card Files Files and versions

WaltonFuture commited on Feb 13

Commit

90899df

·

verified ·

1 Parent(s): d1d170b

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -11,7 +11,7 @@
 **ZwZ-4B** is a fine-grained multimodal perception model built upon [Qwen3-VL-4B](https://huggingface.co/Qwen/Qwen3-VL-4B). It is trained using **Region-to-Image Distillation (R2I)** combined with reinforcement learning, enabling superior fine-grained visual understanding in a single forward pass — no inference-time zooming or tool calling required. ZwZ-4B achieves state-of-the-art performance on fine-grained perception benchmarks among open-source models of comparable size.
 <div align=center>
-<img src="assets/gp_avg_comparison.png" width="90%" alt="avg_comparison"/>
 </div>
 ## Key Features

 **ZwZ-4B** is a fine-grained multimodal perception model built upon [Qwen3-VL-4B](https://huggingface.co/Qwen/Qwen3-VL-4B). It is trained using **Region-to-Image Distillation (R2I)** combined with reinforcement learning, enabling superior fine-grained visual understanding in a single forward pass — no inference-time zooming or tool calling required. ZwZ-4B achieves state-of-the-art performance on fine-grained perception benchmarks among open-source models of comparable size.
 <div align=center>
+<img src="gp_avg_comparison.png" width="90%" alt="avg_comparison"/>
 </div>
 ## Key Features