Update README.md
Browse files
README.md
CHANGED
|
@@ -56,6 +56,7 @@ This model achieves **state-of-the-art accuracy among 1.5B reasoning models** wh
|
|
| 56 |
- **Developed by**: Fang Wu\*, Weihao Xuan\*, Heli Qi\*, Ximing Lu, Aaron Tu, Li Erran Li, Yejin Choi
|
| 57 |
- **Institutional affiliations**: Stanford University, University of Tokyo, RIKEN AIP, University of Washington, UC Berkeley, Amazon AWS, Columbia University
|
| 58 |
- **Paper**: [DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search](https://huggingface.co/papers/2509.25454)
|
|
|
|
| 59 |
- **Base Model**: Nemotron-Research-Reasoning-Qwen-1.5B v2
|
| 60 |
- **Parameters**: 1.5B
|
| 61 |
- **Framework**: veRL
|
|
|
|
| 56 |
- **Developed by**: Fang Wu\*, Weihao Xuan\*, Heli Qi\*, Ximing Lu, Aaron Tu, Li Erran Li, Yejin Choi
|
| 57 |
- **Institutional affiliations**: Stanford University, University of Tokyo, RIKEN AIP, University of Washington, UC Berkeley, Amazon AWS, Columbia University
|
| 58 |
- **Paper**: [DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search](https://huggingface.co/papers/2509.25454)
|
| 59 |
+
- **Code**: [Github](https://github.com/smiles724/DeepSearch)
|
| 60 |
- **Base Model**: Nemotron-Research-Reasoning-Qwen-1.5B v2
|
| 61 |
- **Parameters**: 1.5B
|
| 62 |
- **Framework**: veRL
|