TencentBAC
/

TBAC-VLR1-7B-SFT

Safetensors

qwen2_5_vl

mm math reasoning

Model card Files Files and versions

xet

Community

oulinyu commited on Aug 12, 2025

Commit

c33dc79

verified ·

1 Parent(s): 490bfee

Update README.md

Browse files

Files changed (1) hide show

README.md +10 -13

README.md CHANGED Viewed

@@ -17,19 +17,16 @@ This is a multimodal language model fine-tuned by **Tencent PCG Basic Algorithm
 training using 40k sft data filtered from OpenR1-Math-220k. TBAC-VLR1-3B then employs GRPO (Group Relative Policy Optimization) and adapts Clip-Higher from DAPO,
 achieving strong performance on several multimodal reasoning benchmarks among models of the same size.
-<!-- ## Performance
-| Model                     | **Average** | **MathVista**| **MathVision** | **MathVerse** | **DynaMath**  | **LogicVista** |
-| :-------------------:     | :---------: | :-----------:| :------------: | :-----------: | :-----------: | :----------:   |
-| Qwen2-VL-2B               |     22.4    |      48.0    |      16.1      |      17.5     |      3.8      |     26.6       |
-| InternVL2.5-2B            |     23.8    |      51.1    |      14.0      |      22.3     |      4.4      |     27.3       |
-| InternVL3-2B              |     31.5    |      57.6    |      20.2      |      24.5     |      14.8     |     40.3       |
-| Qwen2.5-VL-3B             |     33.6    |      61.2    |      21.9      |      31.2     |      13.2     |     40.3       |
-| VLM-R1-3B-Math-0305       |     34.1    |      62.7    |      21.9      |      32.2     |      13.0     |     40.5       |
-| Taichu-VLR-3B             |     34.3    |      64.9    |      23.1      |      32.1     |      12.6     |     38.7       |
-| VLAA-Thinker-Qwen2.5VL-3B |     35.7    |      61.0    |      24.4      |      36.4     |      18.2     |     38.5       |
-| TBAC-VLR1-3B-preview      |     36.3    |      64.8    |      25.0      |      33.2     |      17.7     |     40.8       |
-| TBAC-VLR1-3B-SFT          |     35.3    |      57.0    |      27.4      |      41.1     |      15.0     |     36.1       |
-| TBAC-VLR1-3B              |   **36.7**  |      57.5    |      28.7      |      41.1     |      16.1     |     40.0       |  -->
 <!-- ![Performance](./assets/performance.png) -->

 training using 40k sft data filtered from OpenR1-Math-220k. TBAC-VLR1-3B then employs GRPO (Group Relative Policy Optimization) and adapts Clip-Higher from DAPO,
 achieving strong performance on several multimodal reasoning benchmarks among models of the same size.
+## Performance
+| Model                              | **Average** | **MathVista** | **MathVision** | **MathVerse** | **DynaMath** | **LogicVista** |
+| :--------------------------------: | :---------: | :-----------: | :------------: | :-----------: | :----------: | :------------: |
+| Qwen2.5-VL-7B                      | 40.5        | 68.0          | 25.7           | 45.5          | 21.8         | 41.2           |
+| VLAA-Thinker-Qwen2.5-7B            | 42.7        | 68.0          | 26.4           | 48.2          | 22.4         | 48.5           |
+| VL-Rethinker-7B                    | 41.8        | 73.7          | 28.4           | 46.4          | 17.8         | 42.7           |
+| TBAC-VLR1-7B-RL                    | 41.3        | 70.1          | 25.4           | 43.4          | 19.0         | 48.4           |
+| TBAC-VLR1-7B-SFT                   | 41.8        | 65.1          | 28.5           | 49.1          | 20.6         | 45.5           |
+| TBAC-VLR1-7B                       | **43.4**    | 66.7          | **31.4**       | **50.1**      | **22.6**     | 46.4           |
 <!-- ![Performance](./assets/performance.png) -->