Image-Text-to-Text
Transformers
Safetensors
English
qwen3_vl
conversational
langdaohlb commited on
Commit
821943b
·
verified ·
1 Parent(s): 6159ee5

Update Table

Browse files
Files changed (1) hide show
  1. README.md +295 -4
README.md CHANGED
@@ -23,11 +23,302 @@ pipeline_tag: image-text-to-text
23
 
24
  **ZwZ-2B** is a fine-grained multimodal perception model built upon [Qwen3-VL-2B](https://huggingface.co/Qwen/Qwen3-VL-2B). It is trained using **Region-to-Image Distillation (R2I)** combined with reinforcement learning, enabling superior fine-grained visual understanding in a single forward pass — no inference-time zooming or tool calling required. ZwZ-2B achieves state-of-the-art performance on fine-grained perception benchmarks among open-source models of comparable size.
25
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
 
27
- | Model | mmstar | hrbench-4k | hrbench-8k | vstar | cvbench-2d | cvbench-3d | countqa | colorbench | babyvision | mme-realworld-en | mme-realworld-cn | ZoomBench |
28
- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
29
- | **[Qwen3-VL-2B](https://huggingface.co/Qwen/Qwen3-VL-2B)** | 60.4 | 71.75 | 70.12 | 72.77 | 75.45 | 82.42 | 22.19 | 76.86 | 12.11 | 59.52 | 60.77 | 41.30 |
30
- | **ZwZ-2B** | 63.40 | 77.00 | 75.38 | 82.72 | 80.88 | 85.83 | 21.60 | 79.37 | 17.78 | 65.61 | 65.39 | 53.49 |
31
 
32
  <div align=center>
33
  <img src="gp_avg_comparison.png" width="90%" alt="avg_comparison"/>
 
23
 
24
  **ZwZ-2B** is a fine-grained multimodal perception model built upon [Qwen3-VL-2B](https://huggingface.co/Qwen/Qwen3-VL-2B). It is trained using **Region-to-Image Distillation (R2I)** combined with reinforcement learning, enabling superior fine-grained visual understanding in a single forward pass — no inference-time zooming or tool calling required. ZwZ-2B achieves state-of-the-art performance on fine-grained perception benchmarks among open-source models of comparable size.
25
 
26
+ <div style="font-family:-apple-system,BlinkMacSystemFont,'Segoe UI',Roboto,sans-serif;max-width:1200px;margin:0 auto;padding:16px 0">
27
+
28
+ <table style="width:100%;border-collapse:collapse;font-size:13px">
29
+ <thead><tr>
30
+ <th style="padding:10px 7px;text-align:left;font-weight:600;border-bottom:2px solid #7c3aed;color:#7c3aed" rowspan="2">Models</th>
31
+ <th style="padding:10px 7px;text-align:center;font-weight:600;border-bottom:1px solid #7c3aed;color:#7c3aed;font-size:14px" colspan="8">General Perception</th>
32
+ <th style="padding:10px 7px;text-align:center;font-weight:600;border-bottom:1px solid #7c3aed;color:#7c3aed;font-size:14px" colspan="2">Specific Perception</th>
33
+ <th style="padding:10px 7px;text-align:center;font-weight:600;border-bottom:1px solid #7c3aed;color:#7c3aed;font-size:14px" colspan="2">OOD Generalization</th>
34
+ <th style="padding:10px 7px;text-align:center;font-weight:600;border-bottom:2px solid #7c3aed;color:#7c3aed;font-size:14px" rowspan="2">Avg</th>
35
+ </tr>
36
+ <tr>
37
+ <th style="padding:8px 7px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed;font-size:13px">ZoomBench</th>
38
+ <th style="padding:8px 7px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed;font-size:13px">HR-4K</th>
39
+ <th style="padding:8px 7px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed;font-size:13px">HR-8K</th>
40
+ <th style="padding:8px 7px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed;font-size:13px">VStar</th>
41
+ <th style="padding:8px 7px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed;font-size:13px">CV-B.</th>
42
+ <th style="padding:8px 7px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed;font-size:13px">MME-RW-en</th>
43
+ <th style="padding:8px 7px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed;font-size:13px">MME-RW-cn</th>
44
+ <th style="padding:8px 7px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed;font-size:13px;background:rgba(0,180,0,0.1)">GP-Avg</th>
45
+ <th style="padding:8px 7px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed;font-size:13px">CountQA</th>
46
+ <th style="padding:8px 7px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed;font-size:13px">ColorB.</th>
47
+ <th style="padding:8px 7px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed;font-size:13px">MMStar</th>
48
+ <th style="padding:8px 7px;text-align:center;font-weight:500;border-bottom:2px solid #7c3aed;color:#7c3aed;font-size:13px">BabyVision</th>
49
+ </tr>
50
+ </thead>
51
+ <tbody>
52
+
53
+ <!-- Closed-Source Models -->
54
+ <tr><td colspan="14" style="padding:8px 12px;font-weight:600;color:#7c3aed;border-bottom:1px solid rgba(124,58,237,0.2);background:rgba(124,58,237,0.1);font-style:italic">Closed-Source Models</td></tr>
55
+
56
+ <tr style="background:rgba(255,240,150,0.15)">
57
+ <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128,128,128,0.15)">GPT-5.1</td>
58
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">47.22</td>
59
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">67.00</td>
60
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">65.25</td>
61
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">70.16</td>
62
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">84.22</td>
63
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">64.04</td>
64
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">55.57</td>
65
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);background:rgba(0,180,0,0.08)">64.78</td>
66
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">31.41</td>
67
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">83.43</td>
68
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">71.60</td>
69
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">13.92</td>
70
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">59.44</td>
71
+ </tr>
72
+
73
+ <tr style="background:rgba(255,240,150,0.15)">
74
+ <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128,128,128,0.15)">Gemini-3-Flash</td>
75
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">59.29</td>
76
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">87.88</td>
77
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">85.00</td>
78
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">86.39</td>
79
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">89.57</td>
80
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">74.86</td>
81
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">72.62</td>
82
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);background:rgba(0,180,0,0.08)">79.37</td>
83
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">66.88</td>
84
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">85.47</td>
85
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">83.60</td>
86
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">34.51</td>
87
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">75.10</td>
88
+ </tr>
89
+
90
+ <!-- Open-Source Models -->
91
+ <tr><td colspan="14" style="padding:8px 12px;font-weight:600;color:#7c3aed;border-bottom:1px solid rgba(124,58,237,0.2);background:rgba(124,58,237,0.1);font-style:italic">Open-Source Models</td></tr>
92
+
93
+ <tr style="background:rgba(180,180,180,0.08)">
94
+ <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128,128,128,0.15)">Qwen3-VL-2B</td>
95
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">41.30</td>
96
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">71.75</td>
97
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">70.12</td>
98
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">72.77</td>
99
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">78.94</td>
100
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">59.52</td>
101
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">60.77</td>
102
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);background:rgba(0,180,0,0.08)">65.02</td>
103
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">22.19</td>
104
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">76.86</td>
105
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">60.4</td>
106
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">12.11</td>
107
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">56.98</td>
108
+ </tr>
109
+
110
+ <tr style="background:rgba(180,180,180,0.08)">
111
+ <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128,128,128,0.15)">Qwen3-VL-4B</td>
112
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">40.24</td>
113
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">78.25</td>
114
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">72.88</td>
115
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">80.10</td>
116
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">84.95</td>
117
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">63.47</td>
118
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">63.63</td>
119
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);background:rgba(0,180,0,0.08)">69.07</td>
120
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">28.14</td>
121
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">81.63</td>
122
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">69.73</td>
123
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">13.66</td>
124
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">61.52</td>
125
+ </tr>
126
+
127
+ <tr style="background:rgba(180,180,180,0.08)">
128
+ <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128,128,128,0.15)">Qwen2.5-VL-7B</td>
129
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">42.49</td>
130
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">71.62</td>
131
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">67.88</td>
132
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">78.53</td>
133
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">75.34</td>
134
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">60.80</td>
135
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">58.30</td>
136
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);background:rgba(0,180,0,0.08)">64.99</td>
137
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">18.91</td>
138
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">76.36</td>
139
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">61.93</td>
140
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">12.89</td>
141
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">56.82</td>
142
+ </tr>
143
+
144
+ <tr style="background:rgba(180,180,180,0.08)">
145
+ <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128,128,128,0.15)">Qwen3-VL-8B</td>
146
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">37.87</td>
147
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">78.88</td>
148
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">74.63</td>
149
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">86.39</td>
150
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">85.44</td>
151
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">65.96</td>
152
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">66.67</td>
153
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);background:rgba(0,180,0,0.08)">70.83</td>
154
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">28.99</td>
155
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">82.77</td>
156
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">70.93</td>
157
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">12.89</td>
158
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">62.86</td>
159
+ </tr>
160
+
161
+ <tr>
162
+ <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128,128,128,0.15)">MiMo-VL-7B-RL</td>
163
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">45.09</td>
164
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">74.38</td>
165
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">72.88</td>
166
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">81.15</td>
167
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">84.31</td>
168
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">63.40</td>
169
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">59.78</td>
170
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);background:rgba(0,180,0,0.08)">68.71</td>
171
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">28.27</td>
172
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">82.80</td>
173
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">73.53</td>
174
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">16.24</td>
175
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">61.98</td>
176
+ </tr>
177
+
178
+ <tr>
179
+ <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128,128,128,0.15)">MiniCPM-V-4.5 (9B)</td>
180
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">42.60</td>
181
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">69.88</td>
182
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">63.62</td>
183
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">70.16</td>
184
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">80.25</td>
185
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">58.16</td>
186
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">56.23</td>
187
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);background:rgba(0,180,0,0.08)">62.99</td>
188
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">23.43</td>
189
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">79.75</td>
190
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">67.87</td>
191
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">14.95</td>
192
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">56.99</td>
193
+ </tr>
194
+
195
+ <tr>
196
+ <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128,128,128,0.15)">GLM-4.5V (108B)</td>
197
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">49.23</td>
198
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">81.63</td>
199
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">74.88</td>
200
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">83.25</td>
201
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">87.59</td>
202
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">66.04</td>
203
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">60.71</td>
204
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);background:rgba(0,180,0,0.08)">71.90</td>
205
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">35.93</td>
206
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">84.59</td>
207
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">75.87</td>
208
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">15.72</td>
209
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">65.04</td>
210
+ </tr>
211
+
212
+ <tr>
213
+ <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128,128,128,0.15)">Qwen3-VL-235B-A22B</td>
214
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">49.11</td>
215
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)"><strong>84.50</strong></td>
216
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)"><u>81.62</u></td>
217
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">87.96</td>
218
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">86.72</td>
219
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">67.07</td>
220
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">65.29</td>
221
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);background:rgba(0,180,0,0.08)">74.61</td>
222
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)"><u>40.58</u></td>
223
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)"><u>85.62</u></td>
224
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)"><u>76.33</u></td>
225
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)"><u>18.30</u></td>
226
+ <td style="padding:7px 7px;text-```html
227
+ align:center;border-bottom:1px solid rgba(128,128,128,0.15)">67.55</td>
228
+ </tr>
229
+
230
+ <tr>
231
+ <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128,128,128,0.15)">Kimi-K2.5 (1T)</td>
232
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)"><u>56.33</u></td>
233
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">81.87</td>
234
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">75.38</td>
235
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">85.86</td>
236
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)"><strong>89.18</strong></td>
237
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)"><strong>71.51</strong></td>
238
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)"><u>68.40</u></td>
239
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);background:rgba(0,180,0,0.08)">75.50</td>
240
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)"><strong>52.81</strong></td>
241
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)"><strong>86.61</strong></td>
242
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)"><strong>81.80</strong></td>
243
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)"><strong>33.25</strong></td>
244
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)"><strong>71.18</strong></td>
245
+ </tr>
246
+
247
+ <!-- Our Models -->
248
+ <tr><td colspan="14" style="padding:8px 12px;font-weight:600;color:#7c3aed;border-bottom:1px solid rgba(124,58,237,0.2);background:rgba(124,58,237,0.1);font-style:italic">Our Models</td></tr>
249
+
250
+ <tr style="background:rgba(100,130,255,0.06)">
251
+ <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128,128,128,0.15)"><strong>ZwZ-2B (Ours)</strong></td>
252
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">53.49</td>
253
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">77.00</td>
254
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">75.38</td>
255
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">82.72</td>
256
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">83.36</td>
257
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">65.61</td>
258
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">65.39</td>
259
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);background:rgba(0,180,0,0.08)">71.85</td>
260
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">21.60</td>
261
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">79.37</td>
262
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">63.40</td>
263
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">17.78</td>
264
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">62.28</td>
265
+ </tr>
266
+
267
+ <tr style="background:rgba(100,130,255,0.06)">
268
+ <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128,128,128,0.15)"><strong>ZwZ-4B (Ours)</strong></td>
269
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">55.74</td>
270
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">81.75</td>
271
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">79.50</td>
272
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)"><strong>92.67</strong></td>
273
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)"><u>87.90</u></td>
274
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">68.52</td>
275
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">68.09</td>
276
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);background:rgba(0,180,0,0.08)"><u>76.31</u></td>
277
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">30.82</td>
278
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">83.08</td>
279
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">71.13</td>
280
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">16.24</td>
281
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">66.86</td>
282
+ </tr>
283
+
284
+ <tr style="background:rgba(100,130,255,0.06)">
285
+ <td style="padding:7px 7px;padding-left:20px;border-bottom:1px solid rgba(128,128,128,0.15)"><strong>ZwZ-7B (Ours)</strong></td>
286
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">55.62</td>
287
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">75.38</td>
288
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">73.25</td>
289
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">88.48</td>
290
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">79.83</td>
291
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">66.21</td>
292
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">66.96</td>
293
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15);background:rgba(0,180,0,0.08)">72.25</td>
294
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">20.72</td>
295
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">80.82</td>
296
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">63.40</td>
297
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">15.98</td>
298
+ <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128,128,128,0.15)">62.42</td>
299
+ </tr>
300
+
301
+ <tr style="background:rgba(100,130,255,0.06)">
302
+ <td style="padding:7px 7px;padding-left:20px;border-bottom:2px solid #7c3aed"><strong>ZwZ-8B (Ours)</strong></td>
303
+ <td style="padding:7px 7px;text-align:center;border-bottom:2px solid #7c3aed"><strong>58.11</strong></td>
304
+ <td style="padding:7px 7px;text-align:center;border-bottom:2px solid #7c3aed"><u>84.38</u></td>
305
+ <td style="padding:7px 7px;text-align:center;border-bottom:2px solid #7c3aed"><strong>82.00</strong></td>
306
+ <td style="padding:7px 7px;text-align:center;border-bottom:2px solid #7c3aed"><u>91.10</u></td>
307
+ <td style="padding:7px 7px;text-align:center;border-bottom:2px solid #7c3aed">87.40</td>
308
+ <td style="padding:7px 7px;text-align:center;border-bottom:2px solid #7c3aed"><u>69.87</u></td>
309
+ <td style="padding:7px 7px;text-align:center;border-bottom:2px solid #7c3aed"><strong>70.59</strong></td>
310
+ <td style="padding:7px 7px;text-align:center;border-bottom:2px solid #7c3aed;background:rgba(0,180,0,0.08)"><strong>77.64</strong></td>
311
+ <td style="padding:7px 7px;text-align:center;border-bottom:2px solid #7c3aed">32.40</td>
312
+ <td style="padding:7px 7px;text-align:center;border-bottom:2px solid #7c3aed">83.59</td>
313
+ <td style="padding:7px 7px;text-align:center;border-bottom:2px solid #7c3aed">73.13</td>
314
+ <td style="padding:7px 7px;text-align:center;border-bottom:2px solid #7c3aed">16.75</td>
315
+ <td style="padding:7px 7px;text-align:center;border-bottom:2px solid #7c3aed"><u>68.12</u></td>
316
+ </tr>
317
+
318
+ </tbody>
319
+ </table>
320
+ </div>
321
 
 
 
 
 
322
 
323
  <div align=center>
324
  <img src="gp_avg_comparison.png" width="90%" alt="avg_comparison"/>