Update README.md
#2
by
Ximing - opened
README.md
CHANGED
|
@@ -25,7 +25,7 @@ tags:
|
|
| 25 |
[](https://creativecommons.org/licenses/by-nc/4.0/)
|
| 26 |
</div>
|
| 27 |
|
| 28 |
-
**GooseReason‑4B‑Instruct** is a state-of-the-art 4B reasoning model trained via Reinforcement Learning with Verifiable Rewards (RLVR) on [GooseReason-0.7M](https://huggingface.co/datasets/nvidia/Nemotron-Research-GooseReason-0.
|
| 29 |
|
| 30 |
This model is for research and development only.
|
| 31 |
|
|
|
|
| 25 |
[](https://creativecommons.org/licenses/by-nc/4.0/)
|
| 26 |
</div>
|
| 27 |
|
| 28 |
+
**GooseReason‑4B‑Instruct** is a state-of-the-art 4B reasoning model trained via Reinforcement Learning with Verifiable Rewards (RLVR) on [GooseReason-0.7M](https://huggingface.co/datasets/nvidia/Nemotron-Research-GooseReason-0.7M), a large-scale dataset synthesized by the **Golden Goose** pipeline. Starting from [Qwen3‑4B‑Instruct](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) and applying the ProRLv2 RL recipe augmented with GooseReason-0.7M data, **GooseReason-4B-Instruct achieves new state-of-the-art results among 4B-Instruct models across 15 diverse benchmarks**, spanning mathematics, programming, STEM reasoning, instruction following, and logical puzzles.
|
| 29 |
|
| 30 |
This model is for research and development only.
|
| 31 |
|