Ximing commited on
Commit
850e411
·
verified ·
1 Parent(s): 17c4912

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -2
README.md CHANGED
@@ -23,11 +23,12 @@ tags:
23
 
24
  [![Paper](https://img.shields.io/badge/arXiv-2601.22975-b31b1b.svg)](https://arxiv.org/abs/2601.22975)
25
  [![License: CC BY-NC 4.0](https://img.shields.io/badge/License-CC%20BY--NC%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by-nc/4.0/)
26
- <br>
 
27
  **GooseReason-4B-Instruct** is a state-of-the-art 4B reasoning model trained via Reinforcement Learning with Verifiable Rewards (RLVR) on [GooseReason-0.7M](https://huggingface.co/datasets/nvidia/Nemotron-Research-GooseReason-0.6M), a large-scale dataset synthesized by the **Golden Goose** pipeline. Starting from [Qwen3-4B-Instruct](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) and applying the ProRLv2 RL recipe augmented with GooseReason-0.7M data, **GooseReason-4B-Instruct achieves new state-of-the-art results among 4B-Instruct models across 15 diverse benchmarks**, spanning mathematics, programming, STEM reasoning, instruction following, and logical puzzles.
28
 
29
  This model is for research and development only.
30
- </div>
31
 
32
  ## Golden Goose
33
 
 
23
 
24
  [![Paper](https://img.shields.io/badge/arXiv-2601.22975-b31b1b.svg)](https://arxiv.org/abs/2601.22975)
25
  [![License: CC BY-NC 4.0](https://img.shields.io/badge/License-CC%20BY--NC%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by-nc/4.0/)
26
+ </div>
27
+
28
  **GooseReason-4B-Instruct** is a state-of-the-art 4B reasoning model trained via Reinforcement Learning with Verifiable Rewards (RLVR) on [GooseReason-0.7M](https://huggingface.co/datasets/nvidia/Nemotron-Research-GooseReason-0.6M), a large-scale dataset synthesized by the **Golden Goose** pipeline. Starting from [Qwen3-4B-Instruct](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) and applying the ProRLv2 RL recipe augmented with GooseReason-0.7M data, **GooseReason-4B-Instruct achieves new state-of-the-art results among 4B-Instruct models across 15 diverse benchmarks**, spanning mathematics, programming, STEM reasoning, instruction following, and logical puzzles.
29
 
30
  This model is for research and development only.
31
+
32
 
33
  ## Golden Goose
34