oulinyu commited on
Commit
c474bed
·
verified ·
1 Parent(s): 2caf019

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +107 -3
README.md CHANGED
@@ -1,3 +1,107 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model:
4
+ - Qwen/Qwen2.5-VL-7B-Instruct
5
+ tags:
6
+ - mm math reasoning
7
+ datasets:
8
+ - open-r1/OpenR1-Math-220k
9
+ metrics:
10
+ - accuracy
11
+ ---
12
+
13
+ # TBAC-VLR1-7B-SFT
14
+
15
+ ## Overview
16
+ This is a multimodal language model fine-tuned by **Tencent PCG Basic Algorithm Center**. Based on Qwen2.5-VL-7B-Instruct, TBAC-VLR1-7B-SFT undergoes SFT
17
+ training using 40k sft data filtered from OpenR1-Math-220k. TBAC-VLR1-3B then employs GRPO (Group Relative Policy Optimization) and adapts Clip-Higher from DAPO,
18
+ achieving strong performance on several multimodal reasoning benchmarks among models of the same size.
19
+
20
+
21
+ ## Performance
22
+ | Model | **Average** | **MathVista** | **MathVision** | **MathVerse** | **DynaMath** | **LogicVista** |
23
+ | :--------------------------------: | :---------: | :-----------: | :------------: | :-----------: | :----------: | :------------: |
24
+ | Qwen2.5-VL-7B | 40.5 | 68.0 | 25.7 | 45.5 | 21.8 | 41.2 |
25
+ | VLAA-Thinker-Qwen2.5-7B | 42.7 | 68.0 | 26.4 | 48.2 | 22.4 | 48.5 |
26
+ | VL-Rethinker-7B | 41.8 | 73.7 | 28.4 | 46.4 | 17.8 | 42.7 |
27
+ | TBAC-VLR1-7B-RL | 41.3 | 70.1 | 25.4 | 43.4 | 19.0 | 48.4 |
28
+ | TBAC-VLR1-7B-SFT | 41.8 | 65.1 | 28.5 | 49.1 | 20.6 | 45.5 |
29
+ | TBAC-VLR1-7B | **43.4** | 66.7 | **31.4** | **50.1** | **22.6** | 46.4 |
30
+
31
+
32
+ <!-- ![Performance](./assets/performance.png) -->
33
+
34
+ <!-- ![Performance](https://cdn-uploads.huggingface.co/production/uploads/669f83bf353227efaefe83d9/ZXZShbuxRBWIzEeV9-WMt.png) -->
35
+
36
+ <!-- The compared results are sourced from https://opencompass.org.cn. -->
37
+
38
+ The results of our model are self-reported, obtained by running evaluations offline on each benchmark.
39
+
40
+ ## Usage
41
+ ```python
42
+ from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor
43
+ from qwen_vl_utils import process_vision_info
44
+
45
+ model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
46
+ "TencentBAC/TBAC-VLR1-7B", torch_dtype="auto", device_map="auto"
47
+ )
48
+
49
+ processor = AutoProcessor.from_pretrained("TencentBAC/TBAC-VLR1-7B")
50
+
51
+ messages = [
52
+ {
53
+ "role": "system",
54
+ "content": "You FIRST think about the reasoning process as an internal monologue and then provide the final answer. The reasoning process MUST BE enclosed within <think> </think> tags. The final answer MUST BE put in \\boxed{}."
55
+ },
56
+ {
57
+ "role": "user",
58
+ "content": [
59
+ {
60
+ "type": "image",
61
+ "image": image_path,
62
+ },
63
+ {"type": "text", "text": query},
64
+ ],
65
+ }
66
+ ]
67
+
68
+ # Preparation for inference
69
+ text = processor.apply_chat_template(
70
+ messages, tokenize=False, add_generation_prompt=True
71
+ )
72
+ image_inputs, video_inputs = process_vision_info(messages)
73
+ inputs = processor(
74
+ text=[text],
75
+ images=image_inputs,
76
+ videos=video_inputs,
77
+ padding=True,
78
+ return_tensors="pt",
79
+ )
80
+ inputs = inputs.to("cuda")
81
+
82
+ # Inference: Generation of the output
83
+ generated_ids = model.generate(**inputs, max_new_tokens=128, do_sample=False)
84
+ generated_ids_trimmed = [
85
+ out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
86
+ ]
87
+ output_text = processor.batch_decode(
88
+ generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
89
+ )
90
+ print(output_text)
91
+ ```
92
+ ## Citation
93
+ If you find our model useful in your research, please consider giving ❤️ and citations. Thanks!
94
+ ```
95
+ @misc{Ou2025TBACVLR1,
96
+ title = {TBAC-VLR1-7B},
97
+ author = {Ou, Linyu and Xu, Junzhe and Yin, Yuyang},
98
+ year = {2025},
99
+ url = {https://huggingface.co/TencentBAC/TBAC-VLR1-7B},
100
+ }
101
+ ```
102
+
103
+ ---
104
+
105
+ **About**
106
+
107
+ Created by the Tencent PCG Basic Algorithm Center. All rights reserved.