maohaos2 commited on
Commit
7c0311f
·
verified ·
1 Parent(s): 7f178ca

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +36 -3
README.md CHANGED
@@ -1,3 +1,36 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: transformers
4
+ pipeline_tag: text-generation
5
+ base_model:
6
+ - Qwen/Qwen2.5-Math-7B
7
+ ---
8
+
9
+ **Satori-RM-7B** is the Outcome Reward model for training our RL model [Satori-7B-Round2](https://huggingface.co/Satori-reasoning/Satori-7B-Round2). The usage of **Satori-RM-7B** can be found in our released [RL training code](https://github.com/satori-reasoning/Satori).
10
+
11
+
12
+
13
+ # **Resources**
14
+ We provide our training datasets:
15
+ - [Full format tuning dataset](https://huggingface.co/datasets/Satori-reasoning/Satori_FT_data) with 300K unique questions.
16
+ - [RL dataset](https://huggingface.co/datasets/Satori-reasoning/Satori_RL_data) with 550K unique questions.
17
+
18
+ Please refer to our blog and research paper for more technical details of Satori.
19
+ - [Blog](https://satori-reasoning.github.io/blog/satori/)
20
+ - [Paper](https://arxiv.org/pdf/2502.02508)
21
+
22
+ For code, see https://github.com/Satori-reasoning/Satori
23
+
24
+ # **Citation**
25
+ If you find our model and data helpful, please cite our paper:
26
+ ```
27
+ @misc{shen2025satorireinforcementlearningchainofactionthought,
28
+ title={Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search},
29
+ author={Maohao Shen and Guangtao Zeng and Zhenting Qi and Zhang-Wei Hong and Zhenfang Chen and Wei Lu and Gregory Wornell and Subhro Das and David Cox and Chuang Gan},
30
+ year={2025},
31
+ eprint={2502.02508},
32
+ archivePrefix={arXiv},
33
+ primaryClass={cs.CL},
34
+ url={https://arxiv.org/abs/2502.02508},
35
+ }
36
+ ```