Satori-reasoning
/

Satori-RM-7B

Text Generation

text-generation-inference

Model card Files Files and versions

maohaos2 commited on Jun 3, 2025

Commit

7c0311f

·

verified ·

1 Parent(s): 7f178ca

Update README.md

Files changed (1) hide show

README.md +36 -3

README.md CHANGED Viewed

@@ -1,3 +1,36 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+library_name: transformers
+pipeline_tag: text-generation
+base_model:
+- Qwen/Qwen2.5-Math-7B
+---
+**Satori-RM-7B** is the Outcome Reward model for training our RL model [Satori-7B-Round2](https://huggingface.co/Satori-reasoning/Satori-7B-Round2). The usage of **Satori-RM-7B** can be found in our released [RL training code](https://github.com/satori-reasoning/Satori).
+# **Resources**
+We provide our training datasets:
+  - [Full format tuning dataset](https://huggingface.co/datasets/Satori-reasoning/Satori_FT_data) with 300K unique questions.
+  - [RL dataset](https://huggingface.co/datasets/Satori-reasoning/Satori_RL_data) with 550K unique questions.
+Please refer to our blog and research paper for more technical details of Satori.
+ - [Blog](https://satori-reasoning.github.io/blog/satori/)
+ - [Paper](https://arxiv.org/pdf/2502.02508)
+For code, see https://github.com/Satori-reasoning/Satori
+# **Citation**
+If you find our model and data helpful, please cite our paper:
+```
+@misc{shen2025satorireinforcementlearningchainofactionthought,
+      title={Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search},
+      author={Maohao Shen and Guangtao Zeng and Zhenting Qi and Zhang-Wei Hong and Zhenfang Chen and Wei Lu and Gregory Wornell and Subhro Das and David Cox and Chuang Gan},
+      year={2025},
+      eprint={2502.02508},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2502.02508},
+}
+```