fangwu97
/

DeepSearch-1.5B

Text Generation

reinforcement-learning

text-generation-inference

Model card Files Files and versions

fangwu97 commited on Sep 29, 2025

Commit

d6e578b

·

verified ·

1 Parent(s): 83983bd

Update README.md

Files changed (1) hide show

README.md +7 -7

README.md CHANGED Viewed

@@ -22,7 +22,7 @@ model-index:
       name: AIME 2024
       type: text
     metrics:
-    - type: pass@1
       value: 53.65
   - task:
       name: Mathematical Reasoning
@@ -31,7 +31,7 @@ model-index:
       name: AIME 2025
       type: text
     metrics:
-    - type: pass@1
       value: 35.42
   - task:
       name: Mathematical Reasoning
@@ -40,7 +40,7 @@ model-index:
       name: AMC 2023
       type: text
     metrics:
-    - type: pass@1
       value: 90.39
   - task:
       name: Mathematical Reasoning
@@ -49,7 +49,7 @@ model-index:
       name: MATH500
       type: text
     metrics:
-    - type: pass@1
       value: 92.53
   - task:
       name: Mathematical Reasoning
@@ -58,7 +58,7 @@ model-index:
       name: Minerva
       type: text
     metrics:
-    - type: pass@1
       value: 40.00
   - task:
       name: Mathematical Reasoning
@@ -67,7 +67,7 @@ model-index:
       name: Olympiad
       type: text
     metrics:
-    - type: pass@1
       value: 65.72
 ---
 <div align="center">
@@ -77,7 +77,7 @@ model-index:
 **DeepSearch-1.5B🌟** is a 1.5B parameter reasoning model trained with **Reinforcement Learning with Verifiable Rewards (RLVR)**, enhanced by **Monte Carlo Tree Search (MCTS)**.
 Unlike prior approaches that restrict structured search to inference, DeepSearch integrates MCTS *into training*, enabling systematic exploration, fine-grained credit assignment, and efficient replay buffering.
-This model achieves **state-of-the-art accuracy among 1.5B reasoning models** while being **72× more compute-efficient** than extended RL training baselines.
 ![Illstration of DeepSearch algorithm](./deepsearch.png)

       name: AIME 2024
       type: text
     metrics:
+    - type: avg@32
       value: 53.65
   - task:
       name: Mathematical Reasoning
       name: AIME 2025
       type: text
     metrics:
+    - type: avg@32
       value: 35.42
   - task:
       name: Mathematical Reasoning
       name: AMC 2023
       type: text
     metrics:
+    - type: avg@32
       value: 90.39
   - task:
       name: Mathematical Reasoning
       name: MATH500
       type: text
     metrics:
+    - type: avg@32
       value: 92.53
   - task:
       name: Mathematical Reasoning
       name: Minerva
       type: text
     metrics:
+    - type: avg@32
       value: 40.00
   - task:
       name: Mathematical Reasoning
       name: Olympiad
       type: text
     metrics:
+    - type: avg@32
       value: 65.72
 ---
 <div align="center">
 **DeepSearch-1.5B🌟** is a 1.5B parameter reasoning model trained with **Reinforcement Learning with Verifiable Rewards (RLVR)**, enhanced by **Monte Carlo Tree Search (MCTS)**.
 Unlike prior approaches that restrict structured search to inference, DeepSearch integrates MCTS *into training*, enabling systematic exploration, fine-grained credit assignment, and efficient replay buffering.
+This model achieves **state-of-the-art accuracy among 1.5B reasoning models** while being **5.7× more compute-efficient** than extended RL training baselines.
 ![Illstration of DeepSearch algorithm](./deepsearch.png)