Trilogix1 commited on
Commit
cbdee84
·
verified ·
1 Parent(s): 90bf937

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +44 -42
README.md CHANGED
@@ -1,9 +1,8 @@
1
- ---
2
- license: mit
3
  base_model:
4
  - Qwen/Qwen3-30B-A3B-Thinking-2507
5
  ---
6
- <div align="center">
7
 
8
  # MarsRL
9
  <div>
@@ -12,53 +11,56 @@ base_model:
12
  <a href="https://arxiv.org/pdf/2511.11373" target="_blank">Paper</a> | <a href="https://github.com/liushulinle/MarsRL" target="_blank">GitHub</a>
13
  </div>
14
 
15
- ## Overview
16
- <hr />
17
- Recent progress in large language models (LLMs) has been propelled by reinforcement learning with verifiable rewards (RLVR) and test-time scaling. However, the limited output length of LLMs constrains the depth of reasoning attainable in a single inference process. Multi-agent reasoning systems offer a promising alternative by employing multiple agents including Solver, Verifier, and Corrector, to iteratively refine solutions. While effective in closed-source models like Gemini 2.5 Pro, they struggle to generalize to open-source models due to insufficient critic and correction capabilities. To address this, we propose MarsRL, a novel reinforcement learning framework with agentic pipeline parallelism, designed to jointly optimize all agents in the system. MarsRL introduces agent-specific reward mechanisms to mitigate reward noise and employs pipeline-inspired training to enhance efficiency in handling long trajectories. Applied to Qwen3-30B-A3B-Thinking-2507, MarsRL improves AIME2025 accuracy from 86.5\% to 93.3\% and BeyondAIME from 64.9\% to 73.8\%, even surpassing Qwen3-235B-A22B-Thinking-2507. These findings highlight the potential of MarsRL to advance multi-agent reasoning systems and broaden their applicability across diverse reasoning tasks.
18
- <div align="center">
19
- <img src="home.jpg" width="80%" />
20
- </div>
 
 
 
 
 
 
 
 
 
 
 
 
21
 
22
- ## V-C Reasoning System Evaluation Instructions
23
- <hr />
24
 
25
- ### step1: Download our released model or other open source models
26
- Supported models: Qwen3/DeepSeekV3.1/DeepSeek R1. You can modify the llm_client.py to use other models.
27
 
28
- ### step2: Deploy service via VLLM
29
 
30
- ### step3: Run the V-C reasoning system by the following commands:
31
- ```
32
- python3 vc_reasoning_system.py solver_ip_port_1,solver_ip_port_2,... vc_ip_port_1,vc_ip_port_2,... test_file output_dir
33
- for example: python3 vc_reasoning_system.py 8.8.8.8:8021,12.34.56.78:8021 8.8.8.8:8021,12.34.56.78:8021 ./outputs/debug ./test_corpus/aime2025.jsonl
34
- ```
35
- This step will run the reasoning system for each problem in the given $test_file$, the predicted results can be found in the output_dir
36
 
37
- ### step4: Extract final solutions by the following commands:
38
- ```
39
- python3 extract_solution.py result_dir test_file
40
- for example: python3 extract_solution.py ./outputs/debug ./test_corpus/aime_2025.jsonl
41
- ```
42
- This step will generate a file named "eval_overalljsonl" in the input_dir. Your can evaluate the metrics based on this file.
43
 
44
- ## Acknowledgements
45
- <hr />
46
 
47
- - Our implementation is heaviliy built on [verl](https://github.com/volcengine/verl).
48
 
49
- - Our models are trained on top of [Qwen3-30B-A3B-Thinking-2507](https://huggingface.co/Qwen/Qwen3-30B-A3B-Thinking-2507).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
50
 
51
- - Our V-C Reasoning system is built on [IMO25 pipline](https://github.com/lyang36/IMO25).
52
-
53
- Thanks for their wonderful work.
54
 
55
- ## Citation
56
- <hr />
57
 
58
- ```bibtex
59
- @article{Marsrl2025,
60
- title = {MarsRL: Advancing Multi-Agent Reasoning System via Reinforcement Learning with Agentic Pipeline Parallelism},
61
- author = {Shulin Liu, Dong Du, Tao Yang, Yang Li, Boyu Qiu}
62
- year = {2025}
63
- }
64
- ```
 
1
+
 
2
  base_model:
3
  - Qwen/Qwen3-30B-A3B-Thinking-2507
4
  ---
5
+
6
 
7
  # MarsRL
8
  <div>
 
11
  <a href="https://arxiv.org/pdf/2511.11373" target="_blank">Paper</a> | <a href="https://github.com/liushulinle/MarsRL" target="_blank">GitHub</a>
12
  </div>
13
 
14
+ Trilogix1/Hugston-forestliutcMarsRL-f16 pipeline_tag: text-generation tags:
15
+ # Thinking
16
+ # Coder
17
+ # Hugston
18
+ # Trilogix1/Hugston-forestliutcMarsRL-f16
19
+
20
+ ---
21
+
22
+ # Original weights at: https://huggingface.co/forestliutc/MarsRL
23
+
24
+ This is an converted and quantized version by Hugston Team created with Quanta (see Github to know more about it).
25
+ This is a crude, proof-of-concept implementation to convert and quantize a .safetensor llm model in GGUF.
26
+
27
+
28
+ ![Screenshot 2025-11-21 114116](https://cdn-uploads.huggingface.co/production/uploads/6818be9259cb758d06603579/UtgEP2NnXMLEV7rYEYoDw.png)
29
+
30
+
31
+ Quantization was performed using an automatic and faster method, which leads to less time and faster results.
32
 
33
+ This model was made possible by: https://Hugston.com
 
34
 
35
+ You can use the model with HugstonOne Enterprise Edition
 
36
 
 
37
 
 
 
 
 
 
 
38
 
 
 
 
 
 
 
39
 
40
+ Tested and ecountered small precision errors in coding tasks but the model is quite impressive for the size.
41
+ We see the model fit for non precision tasks (like game vibecoding coding and general tasks).
42
 
43
+ ![Screenshot 2025-11-22 133409](https://cdn-uploads.huggingface.co/production/uploads/6818be9259cb758d06603579/_sEqG886fGo6qv1y0dY7B.png)
44
 
45
+ ---
46
+
47
+ Watch HugstonOne coding and preview in action:
48
+ ---
49
+ https://vimeo.com/1121493834?share=copy&fl=sv&fe=ci
50
+ Usage
51
+ ---
52
+ -Download App HugstonOne at Hugston.com or at https://github.com/Mainframework
53
+ ---
54
+ -Download model from https://hugston.com/explore?folder=llm_models or Huggingface
55
+ ---
56
+ -If you already have the Llm Model downloaded chose it by clicking pick model in HugstonOne -Then click Load model in Cli or Server
57
+
58
+ ---
59
+
60
+ -For multimodal use you need a VL/multimodal LLM model with the Mmproj file in the same folder. -Select model and select mmproj.
61
+
62
+ ---
63
 
64
+ -Note: if the mmproj is inside the same folder with other models non multimodal, the non model will not load unless the mmproj is moved from folder.
 
 
65
 
 
 
66