Image-Text-to-Text
Transformers
Safetensors
English
qwen3_vl
conversational
langdaohlb commited on
Commit
3a60dd9
·
verified ·
1 Parent(s): 5e22c5c

add benchmark table

Browse files
Files changed (1) hide show
  1. README.md +298 -0
README.md CHANGED
@@ -23,6 +23,304 @@ pipeline_tag: image-text-to-text
23
 
24
  **ZwZ-4B** is a fine-grained multimodal perception model built upon [Qwen3-VL-4B](https://huggingface.co/Qwen/Qwen3-VL-4B). It is trained using **Region-to-Image Distillation (R2I)** combined with reinforcement learning, enabling superior fine-grained visual understanding in a single forward pass — no inference-time zooming or tool calling required. ZwZ-4B achieves state-of-the-art performance on fine-grained perception benchmarks among open-source models of comparable size.
25
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
  <div align=center>
27
  <img src="gp_avg_comparison.png" width="90%" alt="avg_comparison"/>
28
  </div>
 
23
 
24
  **ZwZ-4B** is a fine-grained multimodal perception model built upon [Qwen3-VL-4B](https://huggingface.co/Qwen/Qwen3-VL-4B). It is trained using **Region-to-Image Distillation (R2I)** combined with reinforcement learning, enabling superior fine-grained visual understanding in a single forward pass — no inference-time zooming or tool calling required. ZwZ-4B achieves state-of-the-art performance on fine-grained perception benchmarks among open-source models of comparable size.
25
 
26
+ <div style="font-family:-apple-system,BlinkMacSystemFont,'Segoe UI',Roboto,sans-serif;max-width:780px;margin:0 auto;padding:12px 0">
27
+
28
+ <p style="font-size:11px;color:#555;margin-bottom:10px">
29
+ <table style="width:100%;border-collapse:collapse;font-size:10px">
30
+ <thead>
31
+ <tr>
32
+ <th style="padding:6px 4px;text-align:left;font-weight:600;border-bottom:2px solid #7c3aed;color:#7c3aed;font-size:10px" rowspan="2">Models</th>
33
+ <th style="padding:6px 4px;text-align:center;font-weight:600;border-bottom:1px solid #7c3aed;color:#7c3aed;font-size:10px" colspan="8">General Perception</th>
34
+ <th style="padding:6px 4px;text-align:center;font-weight:600;border-bottom:1px solid #7c3aed;color:#7c3aed;font-size:10px" colspan="2">Specific Perception</th>
35
+ <th style="padding:6px 4px;text-align:center;font-weight:600;border-bottom:1px solid #7c3aed;color:#7c3aed;font-size:10px" colspan="2">OOD Generalization</th>
36
+ <th style="padding:6px 4px;text-align:center;font-weight:600;border-bottom:2px solid #7c3aed;color:#7c3aed;font-size:10px" rowspan="2">Avg</th>
37
+ </tr>
38
+ <tr>
39
+ <th style="padding:5px 4px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed;font-size:9px">ZoomBench</th>
40
+ <th style="padding:5px 4px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed;font-size:9px">HR-4K</th>
41
+ <th style="padding:5px 4px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed;font-size:9px">HR-8K</th>
42
+ <th style="padding:5px 4px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed;font-size:9px">VStar</th>
43
+ <th style="padding:5px 4px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed;font-size:9px">CV-B.</th>
44
+ <th style="padding:5px 4px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed;font-size:9px">MME-RW-en</th>
45
+ <th style="padding:5px 4px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed;font-size:9px">MME-RW-cn</th>
46
+ <th style="padding:5px 4px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed;font-size:9px;background:rgba(0,180,0,0.1)">GP-Avg</th>
47
+ <th style="padding:5px 4px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed;font-size:9px">CountQA</th>
48
+ <th style="padding:5px 4px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed;font-size:9px">ColorB.</th>
49
+ <th style="padding:5px 4px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed;font-size:9px">MMStar</th>
50
+ <th style="padding:5px 4px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed;font-size:9px">BabyVision</th>
51
+ </tr>
52
+ </thead>
53
+ <tbody>
54
+
55
+ <!-- Closed-Source Models -->
56
+ <tr><td colspan="14" style="padding:5px 8px;font-weight:600;font-size:9px;color:#7c3aed;border-bottom:1px solid rgba(124,58,237,0.2);background:rgba(124,58,237,0.1);font-style:italic">Closed-Source Models</td></tr>
57
+
58
+ <tr style="background:rgba(255,240,150,0.15)">
59
+ <td style="padding:4px 4px;padding-left:14px;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">GPT-5.1</td>
60
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">47.22</td>
61
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">67.00</td>
62
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">65.25</td>
63
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">70.16</td>
64
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">84.22</td>
65
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">64.04</td>
66
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">55.57</td>
67
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px;background:rgba(0,180,0,0.08)">64.78</td>
68
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">31.41</td>
69
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">83.43</td>
70
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">71.60</td>
71
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">13.92</td>
72
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">59.44</td>
73
+ </tr>
74
+
75
+ <tr style="background:rgba(255,240,150,0.15)">
76
+ <td style="padding:4px 4px;padding-left:14px;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">Gemini-3-Flash</td>
77
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">59.29</td>
78
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">87.88</td>
79
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">85.00</td>
80
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">86.39</td>
81
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">89.57</td>
82
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">74.86</td>
83
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">72.62</td>
84
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px;background:rgba(0,180,0,0.08)">79.37</td>
85
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">66.88</td>
86
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">85.47</td>
87
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">83.60</td>
88
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">34.51</td>
89
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">75.10</td>
90
+ </tr>
91
+
92
+ <!-- Open-Source Models -->
93
+ <tr><td colspan="14" style="padding:5px 8px;font-weight:600;font-size:9px;color:#7c3aed;border-bottom:1px solid rgba(124,58,237,0.2);background:rgba(124,58,237,0.1);font-style:italic">Open-Source Models</td></tr>
94
+
95
+ <tr style="background:rgba(180,180,180,0.08)">
96
+ <td style="padding:4px 4px;padding-left:14px;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">Qwen3-VL-2B</td>
97
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">41.30</td>
98
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">71.75</td>
99
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">70.12</td>
100
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">72.77</td>
101
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">78.94</td>
102
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">59.52</td>
103
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">60.77</td>
104
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px;background:rgba(0,180,0,0.08)">65.02</td>
105
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">22.19</td>
106
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">76.86</td>
107
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">60.4</td>
108
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">12.11</td>
109
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">56.98</td>
110
+ </tr>
111
+
112
+ <tr style="background:rgba(180,180,180,0.08)">
113
+ <td style="padding:4px 4px;padding-left:14px;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">Qwen3-VL-4B</td>
114
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">40.24</td>
115
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">78.25</td>
116
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">72.88</td>
117
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">80.10</td>
118
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">84.95</td>
119
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">63.47</td>
120
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">63.63</td>
121
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px;background:rgba(0,180,0,0.08)">69.07</td>
122
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">28.14</td>
123
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">81.63</td>
124
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">69.73</td>
125
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">13.66</td>
126
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">61.52</td>
127
+ </tr>
128
+
129
+ <tr style="background:rgba(180,180,180,0.08)">
130
+ <td style="padding:4px 4px;padding-left:14px;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">Qwen2.5-VL-7B</td>
131
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">42.49</td>
132
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">71.62</td>
133
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">67.88</td>
134
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">78.53</td>
135
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">75.34</td>
136
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">60.80</td>
137
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">58.30</td>
138
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px;background:rgba(0,180,0,0.08)">64.99</td>
139
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">18.91</td>
140
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">76.36</td>
141
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">61.93</td>
142
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">12.89</td>
143
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">56.82</td>
144
+ </tr>
145
+
146
+ <tr style="background:rgba(180,180,180,0.08)">
147
+ <td style="padding:4px 4px;padding-left:14px;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">Qwen3-VL-8B</td>
148
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">37.87</td>
149
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">78.88</td>
150
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">74.63</td>
151
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">86.39</td>
152
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">85.44</td>
153
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">65.96</td>
154
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">66.67</td>
155
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px;background:rgba(0,180,0,0.08)">70.83</td>
156
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">28.99</td>
157
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">82.77</td>
158
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">70.93</td>
159
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">12.89</td>
160
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">62.86</td>
161
+ </tr>
162
+
163
+ <tr>
164
+ <td style="padding:4px 4px;padding-left:14px;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">MiMo-VL-7B-RL</td>
165
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">45.09</td>
166
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">74.38</td>
167
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">72.88</td>
168
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">81.15</td>
169
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">84.31</td>
170
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">63.40</td>
171
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">59.78</td>
172
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px;background:rgba(0,180,0,0.08)">68.71</td>
173
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">28.27</td>
174
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">82.80</td>
175
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">73.53</td>
176
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">16.24</td>
177
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">61.98</td>
178
+ </tr>
179
+
180
+ <tr>
181
+ <td style="padding:4px 4px;padding-left:14px;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">MiniCPM-V-4.5 (9B)</td>
182
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">42.60</td>
183
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">69.88</td>
184
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">63.62</td>
185
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">70.16</td>
186
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">80.25</td>
187
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">58.16</td>
188
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">56.23</td>
189
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px;background:rgba(0,180,0,0.08)">62.99</td>
190
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">23.43</td>
191
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">79.75</td>
192
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">67.87</td>
193
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">14.95</td>
194
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">56.99</td>
195
+ </tr>
196
+
197
+ <tr>
198
+ <td style="padding:4px 4px;padding-left:14px;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">GLM-4.5V (108B)</td>
199
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">49.23</td>
200
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">81.63</td>
201
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">74.88</td>
202
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">83.25</td>
203
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">87.59</td>
204
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">66.04</td>
205
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">60.71</td>
206
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px;background:rgba(0,180,0,0.08)">71.90</td>
207
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">35.93</td>
208
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">84.59</td>
209
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">75.87</td>
210
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">15.72</td>
211
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">65.04</td>
212
+ </tr>
213
+
214
+ <tr>
215
+ <td style="padding:4px 4px;padding-left:14px;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">Qwen3-VL-235B-A22B</td>
216
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">49.11</td>
217
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px"><strong>84.50</strong></td>
218
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px"><u>81.62</u></td>
219
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">87.96</td>
220
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">86.72</td>
221
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">67.07</td>
222
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">65.29</td>
223
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px;background:rgba(0,180,0,0.08)">74.61</td>
224
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px"><u>40.58</u></td>
225
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px"><u>85.62</u></td>
226
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px"><u>76.33</u></td>
227
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px"><u>18.30</u></td>
228
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">67.55</td>
229
+ </tr>
230
+
231
+ <tr>
232
+ <td style="padding:4px 4px;padding-left:14px;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">Kimi-K2.5 (1T)</td>
233
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px"><u>56.33</u></td>
234
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">81.87</td>
235
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">75.38</td>
236
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">85.86</td>
237
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px"><strong>89.18</strong></td>
238
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px"><strong>71.51</strong></td>
239
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px"><u>68.40</u></td>
240
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px;background:rgba(0,180,0,0.08)">75.50</td>
241
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px"><strong>52.81</strong></td>
242
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px"><strong>86.61</strong></td>
243
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px"><strong>81.80</strong></td>
244
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px"><strong>33.25</strong></td>
245
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px"><strong>71.18</strong></td>
246
+ </tr>
247
+
248
+ <!-- Our Models -->
249
+ <tr><td colspan="14" style="padding:5px 8px;font-weight:600;font-size:9px;color:#7c3aed;border-bottom:1px solid rgba(124,58,237,0.2);background:rgba(124,58,237,0.1);font-style:italic">Our Models</td></tr>
250
+
251
+ <tr style="background:rgba(100,130,255,0.06)">
252
+ <td style="padding:4px 4px;padding-left:14px;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px"><strong>ZwZ-2B (Ours)</strong></td>
253
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">53.49</td>
254
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">77.00</td>
255
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">75.38</td>
256
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">82.72</td>
257
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">83.36</td>
258
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">65.61</td>
259
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">65.39</td>
260
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px;background:rgba(0,180,0,0.08)">71.85</td>
261
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">21.60</td>
262
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">79.37</td>
263
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">63.40</td>
264
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">17.78</td>
265
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">62.28</td>
266
+ </tr>
267
+
268
+ <tr style="background:rgba(100,130,255,0.06)">
269
+ <td style="padding:4px 4px;padding-left:14px;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px"><strong>ZwZ-4B (Ours)</strong></td>
270
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">55.74</td>
271
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">81.75</td>
272
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">79.50</td>
273
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px"><strong>92.67</strong></td>
274
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px"><u>87.90</u></td>
275
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">68.52</td>
276
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">68.09</td>
277
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px;background:rgba(0,180,0,0.08)"><u>76.31</u></td>
278
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">30.82</td>
279
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">83.08</td>
280
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">71.13</td>
281
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">16.24</td>
282
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">66.86</td>
283
+ </tr>
284
+
285
+ <tr style="background:rgba(100,130,255,0.06)">
286
+ <td style="padding:4px 4px;padding-left:14px;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px"><strong>ZwZ-7B (Ours)</strong></td>
287
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">55.62</td>
288
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">75.38</td>
289
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">73.25</td>
290
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">88.48</td>
291
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">79.83</td>
292
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">66.21</td>
293
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">66.96</td>
294
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px;background:rgba(0,180,0,0.08)">72.25</td>
295
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">20.72</td>
296
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">80.82</td>
297
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">63.40</td>
298
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">15.98</td>
299
+ <td style="padding:4px 4px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);font-size:10px">62.42</td>
300
+ </tr>
301
+
302
+ <tr style="background:rgba(100,130,255,0.06)">
303
+ <td style="padding:4px 4px;padding-left:14px;border-bottom:2px solid #7c3aed;font-size:10px"><strong>ZwZ-8B (Ours)</strong></td>
304
+ <td style="padding:4px 4px;text-align:center;border-bottom:2px solid #7c3aed;font-size:10px"><strong>58.11</strong></td>
305
+ <td style="padding:4px 4px;text-align:center;border-bottom:2px solid #7c3aed;font-size:10px"><u>84.38</u></td>
306
+ <td style="padding:4px 4px;text-align:center;border-bottom:2px solid #7c3aed;font-size:10px"><strong>82.00</strong></td>
307
+ <td style="padding:4px 4px;text-align:center;border-bottom:2px solid #7c3aed;font-size:10px"><u>91.10</u></td>
308
+ <td style="padding:4px 4px;text-align:center;border-bottom:2px solid #7c3aed;font-size:10px">87.40</td>
309
+ <td style="padding:4px 4px;text-align:center;border-bottom:2px solid #7c3aed;font-size:10px"><u>69.87</u></td>
310
+ <td style="padding:4px 4px;text-align:center;border-bottom:2px solid #7c3aed;font-size:10px"><strong>70.59</strong></td>
311
+ <td style="padding:4px 4px;text-align:center;border-bottom:2px solid #7c3aed;font-size:10px;background:rgba(0,180,0,0.08)"><strong>77.64</strong></td>
312
+ <td style="padding:4px 4px;text-align:center;border-bottom:2px solid #7c3aed;font-size:10px">32.40</td>
313
+ <td style="padding:4px 4px;text-align:center;border-bottom:2px solid #7c3aed;font-size:10px">83.59</td>
314
+ <td style="padding:4px 4px;text-align:center;border-bottom:2px solid #7c3aed;font-size:10px">73.13</td>
315
+ <td style="padding:4px 4px;text-align:center;border-bottom:2px solid #7c3aed;font-size:10px">16.75</td>
316
+ <td style="padding:4px 4px;text-align:center;border-bottom:2px solid #7c3aed;font-size:10px"><u>68.12</u></td>
317
+ </tr>
318
+
319
+ </tbody>
320
+ </table>
321
+ </div>
322
+
323
+
324
  <div align=center>
325
  <img src="gp_avg_comparison.png" width="90%" alt="avg_comparison"/>
326
  </div>