fangwu97
/

DeepSearch-1.5B

@@ -1,7 +1,11 @@
 ---
 language:
 - en
 library_name: transformers
 tags:
 - reasoning
 - reinforcement-learning
@@ -9,67 +13,30 @@ tags:
 - mcts
 - math
 - iclr-2026
-license: apache-2.0
-datasets:
-- DeepMath-103K
 model-index:
 - name: DeepSearch-1.5B
   results:
   - task:
-      name: Mathematical Reasoning
       type: text-generation
     dataset:
       name: AIME 2024
       type: text
     metrics:
     - type: avg@32
       value: 53.65
-  - task:
-      name: Mathematical Reasoning
-      type: text-generation
-    dataset:
-      name: AIME 2025
-      type: text
-    metrics:
     - type: avg@32
       value: 35.42
-  - task:
-      name: Mathematical Reasoning
-      type: text-generation
-    dataset:
-      name: AMC 2023
-      type: text
-    metrics:
     - type: avg@32
       value: 90.39
-  - task:
-      name: Mathematical Reasoning
-      type: text-generation
-    dataset:
-      name: MATH500
-      type: text
-    metrics:
     - type: avg@32
       value: 92.53
-  - task:
-      name: Mathematical Reasoning
-      type: text-generation
-    dataset:
-      name: Minerva
-      type: text
-    metrics:
     - type: avg@32
-      value: 40.00
-  - task:
-      name: Mathematical Reasoning
-      type: text-generation
-    dataset:
-      name: Olympiad
-      type: text
-    metrics:
     - type: avg@32
       value: 65.72
 ---
 <div align="center">
 <span style="font-family: default; font-size: 1.5em;">🚀 DeepSearch-1.5B</span>
 </div>
@@ -88,7 +55,7 @@ This model achieves **state-of-the-art accuracy among 1.5B reasoning models** wh
 - **Developed by**: Fang Wu\*, Weihao Xuan\*, Heli Qi\*, Ximing Lu, Aaron Tu, Li Erran Li, Yejin Choi
 - **Institutional affiliations**: Stanford University, University of Tokyo, RIKEN AIP, University of Washington, UC Berkeley, Amazon AWS, Columbia University
-- **Paper**: DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search
 - **Base Model**: Nemotron-Research-Reasoning-Qwen-1.5B v2
 - **Parameters**: 1.5B
 - **Framework**: veRL
@@ -114,7 +81,8 @@ from transformers import AutoTokenizer
 def convert_question_to_messages(question: str):
     messages = [
         {"role": "user",
-         "content": question + " Let's think step by step and output the final answer within \\boxed{}."}
     ]
     return messages
@@ -155,7 +123,7 @@ print(response)
 | Olympiad  | 64.69 | **65.72** |
 | **Average** | 61.70 | **62.95** |
-DeepSearch improves average accuracy by **+1.25 points** over the best prior 1.5B model, while using **5.7× fewer GPU hours**.
 ## Training
@@ -191,3 +159,4 @@ DeepSearch improves average accuracy by **+1.25 points** over the best prior 1.5
   primaryClass = {cs.AI},
   doi          = {10.48550/arXiv.2509.25454},
 }

 ---
+datasets:
+- DeepMath-103K
 language:
 - en
 library_name: transformers
+license: apache-2.0
+pipeline_tag: text-generation
 tags:
 - reasoning
 - reinforcement-learning
 - mcts
 - math
 - iclr-2026
 model-index:
 - name: DeepSearch-1.5B
   results:
   - task:
       type: text-generation
+      name: Mathematical Reasoning
     dataset:
       name: AIME 2024
       type: text
     metrics:
     - type: avg@32
       value: 53.65
     - type: avg@32
       value: 35.42
     - type: avg@32
       value: 90.39
     - type: avg@32
       value: 92.53
     - type: avg@32
+      value: 40.0
     - type: avg@32
       value: 65.72
 ---
 <div align="center">
 <span style="font-family: default; font-size: 1.5em;">🚀 DeepSearch-1.5B</span>
 </div>
 - **Developed by**: Fang Wu\*, Weihao Xuan\*, Heli Qi\*, Ximing Lu, Aaron Tu, Li Erran Li, Yejin Choi
 - **Institutional affiliations**: Stanford University, University of Tokyo, RIKEN AIP, University of Washington, UC Berkeley, Amazon AWS, Columbia University
+- **Paper**: [DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search](https://huggingface.co/papers/2509.25454)
 - **Base Model**: Nemotron-Research-Reasoning-Qwen-1.5B v2
 - **Parameters**: 1.5B
 - **Framework**: veRL
 def convert_question_to_messages(question: str):
     messages = [
         {"role": "user",
+         "content": question + " Let's think step by step and output the final answer within \\boxed{}. \
+"}
     ]
     return messages
 | Olympiad  | 64.69 | **65.72** |
 | **Average** | 61.70 | **62.95** |
+DeepSearch improves average accuracy by **+1.25 points** over the best prior 1.5B model, while using **5.7× more GPU hours**.
 ## Training
   primaryClass = {cs.AI},
   doi          = {10.48550/arXiv.2509.25454},
 }
+```