Improve language tag

#2
by lbourdois - opened
Files changed (1) hide show
  1. README.md +102 -90
README.md CHANGED
@@ -1,91 +1,103 @@
1
- ---
2
- base_model: Qwen/Qwen2.5-32B-Instruct
3
- language:
4
- - en
5
- library_name: transformers
6
- license: apache-2.0
7
- tags:
8
- - multi-agent systems
9
- - multiagent-collaboration
10
- - reasoning
11
- - mathematics
12
- - code
13
- model-index:
14
- - name: m1-32b
15
- results: []
16
- pipeline_tag: text-generation
17
- ---
18
-
19
- [Two Heads are Better Than One: Test-time Scaling of Multi-agent Collaborative Reasoning](https://arxiv.org/pdf/2504.09772)
20
-
21
- **M1-32B** is a 32B-parameter large language model fine-tuned from [Qwen2.5-32B-Instruct](https://arxiv.org/pdf/2412.15115) on the **M500** dataset—an interdisciplinary multi-agent collaborative reasoning dataset. M1-32B is optimized for improved reasoning, discussion, and decision-making in multi-agent systems (MAS), including frameworks such as [AgentVerse](https://github.com/OpenBMB/AgentVerse).
22
-
23
- Code: [https://github.com/jincan333/MAS-TTS](https://github.com/jincan333/MAS-TTS)
24
-
25
- ---
26
-
27
- ## 🚀 Key Features
28
-
29
- - 🧠 **Enhanced Collaborative Reasoning**
30
- Trained on real multi-agent traces involving diverse roles like Expert Recruiter, Problem Solvers, and Evaluator.
31
-
32
- - 🗣️ **Role-Aware Dialogue Generation**
33
- Learns to reason and respond from different expert perspectives based on structured prompts.
34
-
35
- - ⚙️ **Optimized for Multi-Agent Systems**
36
- Performs well as a MAS agent with adaptive collaboration and token budgeting.
37
-
38
- ---
39
-
40
- ## 🏗️ Model Training
41
-
42
- - **Base Model:** Qwen2.5-32B-Instruct
43
- - **Dataset:** [M500](https://huggingface.co/datasets/Can111/M500) (500 curated multi-agent reasoning traces)
44
- - **Objective:** Supervised Fine-Tuning (SFT) on role-conditioned prompts
45
- - **Training Setup:**
46
- - 8 × A100 GPUs
47
- - 5 epochs
48
- - Learning rate: 1e-5
49
- - Frameworks: DeepSpeed, FlashAttention, LLaMA-Factory
50
-
51
- ---
52
-
53
- ## 📊 Performance
54
-
55
- | **Model** | **General Understanding** | | **Mathematical Reasoning** | | **Coding** | |
56
- |--------------------------|---------------------------|----------------|-----------------------------|------------|----------------|-----------|
57
- | | **GPQA** | **Commongen** | **AIME2024** | **MATH-500** | **HumanEval** | **MBPP-S**|
58
- | **Non-Reasoning Models** | | | | | | |
59
- | Qwen2.5 | 50.2 | 96.7 | 21.1 | 84.4 | 89.0 | 80.2 |
60
- | DeepSeek-V3 | **58.6** | **98.6** | **33.3** | **88.6** | 89.6 | 83.9 |
61
- | GPT-4o | 49.2 | 97.8 | 7.8 | 81.3 | **90.9** | **85.4** |
62
- | **Reasoning Models** | | | | | | |
63
- | s1.1-32B | 58.3 | 94.1 | 53.3 | 90.6 | 82.3 | 77.4 |
64
- | DeepSeek-R1 | **75.5** | 97.2 | 78.9 | **96.2** | **98.2** | 91.7 |
65
- | o3-mini | 71.3 | **99.1** | **84.4** | 95.3 | 97.0 | **93.6** |
66
- | M1-32B (Ours) | 61.1 | 96.9 | 60.0 | 95.1 | 92.8 | 89.1 |
67
- | M1-32B w. CEO (Ours) | 62.1 | 97.4 | 62.2 | 95.8 | 93.9 | 90.5 |
68
-
69
- **Table Caption:**
70
- Performance comparison on general understanding, mathematical reasoning, and coding tasks using strong reasoning and non-reasoning models within the AgentVerse framework. Our method achieves substantial improvements over Qwen2.5 and s1.1-32B on all tasks, and attains performance comparable to o3-mini and DeepSeek-R1 on MATH-500 and MBPP-S, demonstrating its effectiveness in enhancing collaborative reasoning in MAS. Note that the results of s1.1-32B are obtained without using budget forcing.
71
-
72
- ---
73
-
74
- ## 💬 Intended Use
75
-
76
- M1-32B is intended for research on Multi-agent reasoning and collaboration in MAS
77
-
78
- ---
79
-
80
- ## Citation
81
-
82
- If you use this model, please cite the relevant papers:
83
-
84
- ```bibtex
85
- @article{jin2025two,
86
- title={Two Heads are Better Than One: Test-time Scaling of Multi-agent Collaborative Reasoning},
87
- author={Jin, Can and Peng, Hongwu and Zhang, Qixin and Tang, Yujin and Metaxas, Dimitris N and Che, Tong},
88
- journal={arXiv preprint arXiv:2504.09772},
89
- year={2025}
90
- }
 
 
 
 
 
 
 
 
 
 
 
 
91
  ```
 
1
+ ---
2
+ base_model: Qwen/Qwen2.5-32B-Instruct
3
+ language:
4
+ - zho
5
+ - eng
6
+ - fra
7
+ - spa
8
+ - por
9
+ - deu
10
+ - ita
11
+ - rus
12
+ - jpn
13
+ - kor
14
+ - vie
15
+ - tha
16
+ - ara
17
+ library_name: transformers
18
+ license: apache-2.0
19
+ tags:
20
+ - multi-agent systems
21
+ - multiagent-collaboration
22
+ - reasoning
23
+ - mathematics
24
+ - code
25
+ pipeline_tag: text-generation
26
+ model-index:
27
+ - name: m1-32b
28
+ results: []
29
+ ---
30
+
31
+ [Two Heads are Better Than One: Test-time Scaling of Multi-agent Collaborative Reasoning](https://arxiv.org/pdf/2504.09772)
32
+
33
+ **M1-32B** is a 32B-parameter large language model fine-tuned from [Qwen2.5-32B-Instruct](https://arxiv.org/pdf/2412.15115) on the **M500** dataset—an interdisciplinary multi-agent collaborative reasoning dataset. M1-32B is optimized for improved reasoning, discussion, and decision-making in multi-agent systems (MAS), including frameworks such as [AgentVerse](https://github.com/OpenBMB/AgentVerse).
34
+
35
+ Code: [https://github.com/jincan333/MAS-TTS](https://github.com/jincan333/MAS-TTS)
36
+
37
+ ---
38
+
39
+ ## 🚀 Key Features
40
+
41
+ - 🧠 **Enhanced Collaborative Reasoning**
42
+ Trained on real multi-agent traces involving diverse roles like Expert Recruiter, Problem Solvers, and Evaluator.
43
+
44
+ - 🗣️ **Role-Aware Dialogue Generation**
45
+ Learns to reason and respond from different expert perspectives based on structured prompts.
46
+
47
+ - ⚙️ **Optimized for Multi-Agent Systems**
48
+ Performs well as a MAS agent with adaptive collaboration and token budgeting.
49
+
50
+ ---
51
+
52
+ ## 🏗️ Model Training
53
+
54
+ - **Base Model:** Qwen2.5-32B-Instruct
55
+ - **Dataset:** [M500](https://huggingface.co/datasets/Can111/M500) (500 curated multi-agent reasoning traces)
56
+ - **Objective:** Supervised Fine-Tuning (SFT) on role-conditioned prompts
57
+ - **Training Setup:**
58
+ - 8 × A100 GPUs
59
+ - 5 epochs
60
+ - Learning rate: 1e-5
61
+ - Frameworks: DeepSpeed, FlashAttention, LLaMA-Factory
62
+
63
+ ---
64
+
65
+ ## 📊 Performance
66
+
67
+ | **Model** | **General Understanding** | | **Mathematical Reasoning** | | **Coding** | |
68
+ |--------------------------|---------------------------|----------------|-----------------------------|------------|----------------|-----------|
69
+ | | **GPQA** | **Commongen** | **AIME2024** | **MATH-500** | **HumanEval** | **MBPP-S**|
70
+ | **Non-Reasoning Models** | | | | | | |
71
+ | Qwen2.5 | 50.2 | 96.7 | 21.1 | 84.4 | 89.0 | 80.2 |
72
+ | DeepSeek-V3 | **58.6** | **98.6** | **33.3** | **88.6** | 89.6 | 83.9 |
73
+ | GPT-4o | 49.2 | 97.8 | 7.8 | 81.3 | **90.9** | **85.4** |
74
+ | **Reasoning Models** | | | | | | |
75
+ | s1.1-32B | 58.3 | 94.1 | 53.3 | 90.6 | 82.3 | 77.4 |
76
+ | DeepSeek-R1 | **75.5** | 97.2 | 78.9 | **96.2** | **98.2** | 91.7 |
77
+ | o3-mini | 71.3 | **99.1** | **84.4** | 95.3 | 97.0 | **93.6** |
78
+ | M1-32B (Ours) | 61.1 | 96.9 | 60.0 | 95.1 | 92.8 | 89.1 |
79
+ | M1-32B w. CEO (Ours) | 62.1 | 97.4 | 62.2 | 95.8 | 93.9 | 90.5 |
80
+
81
+ **Table Caption:**
82
+ Performance comparison on general understanding, mathematical reasoning, and coding tasks using strong reasoning and non-reasoning models within the AgentVerse framework. Our method achieves substantial improvements over Qwen2.5 and s1.1-32B on all tasks, and attains performance comparable to o3-mini and DeepSeek-R1 on MATH-500 and MBPP-S, demonstrating its effectiveness in enhancing collaborative reasoning in MAS. Note that the results of s1.1-32B are obtained without using budget forcing.
83
+
84
+ ---
85
+
86
+ ## 💬 Intended Use
87
+
88
+ M1-32B is intended for research on Multi-agent reasoning and collaboration in MAS
89
+
90
+ ---
91
+
92
+ ## Citation
93
+
94
+ If you use this model, please cite the relevant papers:
95
+
96
+ ```bibtex
97
+ @article{jin2025two,
98
+ title={Two Heads are Better Than One: Test-time Scaling of Multi-agent Collaborative Reasoning},
99
+ author={Jin, Can and Peng, Hongwu and Zhang, Qixin and Tang, Yujin and Metaxas, Dimitris N and Che, Tong},
100
+ journal={arXiv preprint arXiv:2504.09772},
101
+ year={2025}
102
+ }
103
  ```