Update model card
Browse files
README.md
CHANGED
|
@@ -14,7 +14,7 @@ tags:
|
|
| 14 |
|
| 15 |
Binary-Think-RM is a generative reward model with long-horizon reasoning capabilities, introduced in the paper [Think-RM: Enabling Long-Horizon Reasoning in Generative Reward Models](https://arxiv.org/abs/2505.16265).
|
| 16 |
|
| 17 |
-
This model is fine-tuned from [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) using a two-stage training process: (1) reasoning-oriented supervised fine-tuning (SFT) and (2) reinforcement learning with verifiable rewards (RLVR).
|
| 18 |
|
| 19 |
## Model Description
|
| 20 |
|
|
|
|
| 14 |
|
| 15 |
Binary-Think-RM is a generative reward model with long-horizon reasoning capabilities, introduced in the paper [Think-RM: Enabling Long-Horizon Reasoning in Generative Reward Models](https://arxiv.org/abs/2505.16265).
|
| 16 |
|
| 17 |
+
This model is fine-tuned from [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) using a two-stage training process: (1) reasoning-oriented supervised fine-tuning (SFT) using [ilgee/hs2-naive-reasoning-binary-max](https://huggingface.co/datasets/ilgee/hs2-naive-reasoning-binary-max) and (2) reinforcement learning with verifiable rewards (RLVR) using a prompt part of [ilgee/hs2-naive-reasoning-binary-max](https://huggingface.co/datasets/ilgee/hs2-naive-reasoning-binary-max).
|
| 18 |
|
| 19 |
## Model Description
|
| 20 |
|