Update README.md
Browse files
README.md
CHANGED
|
@@ -17,19 +17,16 @@ This is a multimodal language model fine-tuned by **Tencent PCG Basic Algorithm
|
|
| 17 |
training using 40k sft data filtered from OpenR1-Math-220k. TBAC-VLR1-3B then employs GRPO (Group Relative Policy Optimization) and adapts Clip-Higher from DAPO,
|
| 18 |
achieving strong performance on several multimodal reasoning benchmarks among models of the same size.
|
| 19 |
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
|
| 23 |
-
|
|
| 24 |
-
|
|
| 25 |
-
|
|
| 26 |
-
|
|
| 27 |
-
|
|
| 28 |
-
|
|
| 29 |
-
|
|
| 30 |
-
| TBAC-VLR1-3B-preview | 36.3 | 64.8 | 25.0 | 33.2 | 17.7 | 40.8 |
|
| 31 |
-
| TBAC-VLR1-3B-SFT | 35.3 | 57.0 | 27.4 | 41.1 | 15.0 | 36.1 |
|
| 32 |
-
| TBAC-VLR1-3B | **36.7** | 57.5 | 28.7 | 41.1 | 16.1 | 40.0 | -->
|
| 33 |
|
| 34 |
|
| 35 |
<!--  -->
|
|
|
|
| 17 |
training using 40k sft data filtered from OpenR1-Math-220k. TBAC-VLR1-3B then employs GRPO (Group Relative Policy Optimization) and adapts Clip-Higher from DAPO,
|
| 18 |
achieving strong performance on several multimodal reasoning benchmarks among models of the same size.
|
| 19 |
|
| 20 |
+
|
| 21 |
+
## Performance
|
| 22 |
+
| Model | **Average** | **MathVista** | **MathVision** | **MathVerse** | **DynaMath** | **LogicVista** |
|
| 23 |
+
| :--------------------------------: | :---------: | :-----------: | :------------: | :-----------: | :----------: | :------------: |
|
| 24 |
+
| Qwen2.5-VL-7B | 40.5 | 68.0 | 25.7 | 45.5 | 21.8 | 41.2 |
|
| 25 |
+
| VLAA-Thinker-Qwen2.5-7B | 42.7 | 68.0 | 26.4 | 48.2 | 22.4 | 48.5 |
|
| 26 |
+
| VL-Rethinker-7B | 41.8 | 73.7 | 28.4 | 46.4 | 17.8 | 42.7 |
|
| 27 |
+
| TBAC-VLR1-7B-RL | 41.3 | 70.1 | 25.4 | 43.4 | 19.0 | 48.4 |
|
| 28 |
+
| TBAC-VLR1-7B-SFT | 41.8 | 65.1 | 28.5 | 49.1 | 20.6 | 45.5 |
|
| 29 |
+
| TBAC-VLR1-7B | **43.4** | 66.7 | **31.4** | **50.1** | **22.6** | 46.4 |
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
|
| 32 |
<!--  -->
|