Create README.md

a053194 verified 4 months ago

4.7 kB

	---
	library_name: transformers
	license: other
	license_name: nvidia-open-model-license
	license_link: >-
	https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/
	pipeline_tag: text-generation
	language:
	- en
	tags:
	- nvidia
	- Nemotron-Cascade
	- reasoning
	- general-purpose
	- SFT
	- RL
	- pytorch
	---

	# Nemotron-Cascade-8B Intermediate ckpts

	<p align="center">

	[![Technical Report](https://img.shields.io/badge/2512.13607-Technical_Report-blue)](https://arxiv.org/abs/2512.13607)
	[![SFT Dataset](https://img.shields.io/badge/🤗-SFT_Datset-blue)](https://huggingface.co/collections/nvidia/nemotron-cascade)
	[![RL Dataset](https://img.shields.io/badge/🤗-RL_Datset-blue)](https://huggingface.co/collections/nvidia/nemotron-cascade)
	[![Models](https://img.shields.io/badge/🤗-Models-blue)](https://huggingface.co/collections/nvidia/nemotron-cascade)
	</p>


	## Introduction

	This repository releases the intermediate checkpoints produced during the development of [Nemotron-Cascade-8B](https://huggingface.co/nvidia/Nemotron-Cascade-8B). Nemotron-Cascade-8B is a general-purpose model trained using a sequential, domain-wise reinforcement learning pipeline, illustrated in the figure below.

	<img src="fig/pipeline.png" alt="train_pipeline_fig" style="width: 1000px; max-width: 100%;" />

	We release checkpoints corresponding to each major stage of training:

	- Nemotron-Cascade-8B-SFT (completed multi-stage SFT)
	- Nemotron-Cascade-8B-RLHF (completed RLHF)
	- Nemotron-Cascade-8B-IFRL (completed instruction following RL)
	- Nemotron-Cascade-8B-MathRL (completed Math RL)
	- Nemotron-Cascade-8B-CodeRL (completed Code RL)

	The final model, [Nemotron-Cascade-8B](https://huggingface.co/nvidia/Nemotron-Cascade-8B), is obtained after the concluding SWE RL stage.


	## Usage Recommendations

	We recommend using RoPE scaling with the [YaRN](https://arxiv.org/abs/2309.00071) method to better support contexts longer than 32K. This can be enabled by updating the model’s `config.json` as shown below:
	```json
	{
	...,
	"rope_scaling": {
	"rope_type": "yarn",
	"factor": 2.0,
	"original_max_position_embeddings": 32768
	}
	}
	```


	## Results

	Same as [Nemotron-Cascade-8B](https://huggingface.co/nvidia/Nemotron-Cascade-8B), we use a maximum output length of 64K tokens for evaluation, with the temperature set to 0.6 and top-p to 0.95. We also apply RoPE scaling using the YaRN method with a scaling factor of 2.0.

	\| Benchmark<br>Metric: Pass@1 \| Nemotron-<br>Cascade-8B-SFT \| Nemotron-<br>Cascade-8B-RLHF \| Nemotron-<br>Cascade-8B-IFRL \| Nemotron-<br>Cascade-8B-MathRL \| Nemotron-<br>Cascade-8B-CodeRL \| Nemotron-<br>Cascade-8B \|
	\| :---- \| :---: \| :---: \| :---: \| :---: \| :---: \| :---: \|
	\| *Knowledge Reasoning* \|
	\| MMLU \| 83.0 \| 83.1 \| 83.4 \| 83.4 \| 83.7 \| 83.7 \|
	\| MMLU Pro \| 74.4 \| 77.8 \| 74.5 \| 75.0 \| 75.3 \| 75.7 \|
	\| GPQA-Diamond \| 63.5 \| 66.8 \| 66.1 \| 65.7 \| 67.4 \| 66.5 \|
	\| *Alignment* \|
	\| ArenaHard \| 70.0 \| 90.1 \| 88.0 \| 87.0 \| 87.8 \| 87.9 \|
	\| IFEval (Strict Prompt) \| 70.8 \| 50.1 \| 90.4 \| 92.1 \| 90.7 \| 90.2 \|
	\| IFBench \| 21.2 \| 24.5 \| 40.5 \| 40.4 \| 38.1 \| 40.8 \|
	\| *Math* \|
	\| AIME 2024 \| 83.6 \| 86.1 \| 86.2 \| 90.2 \| 89.1 \| 89.5 \|
	\| AIME 2025 \| 72.8 \| 75.0 \| 75.2 \| 81.9 \| 80.5 \| 80.1 \|
	\| *Code* \|
	\| LCB v5 (08/24-02/25) \| 59.2 \| 70.2 \| 70.2 \| 70.6 \| 75.3 \| 74.3 \|
	\| LCB v6 (08/24-05/25) \| 56.7 \| 67.2 \| 66.7 \| 67.4 \| 71.5 \| 71.1 \|
	\| SWE Verified (Agentless) \| 26.1 \| 28.2 \| 28.3 \| 30.6 \| 31.6 \| 37.2 \|


	## Chat Template

	All intermediate checkpoints use the same chat template as [Nemotron-Cascade-8B](https://huggingface.co/nvidia/Nemotron-Cascade-8B). Each is a unified model supporting both *thinking* and *instruct* (non-reasoning) modes. To switch between these two modes, simply append the `" /think"` (for *thinking) or the `" /no_think"` (for instruct*) tag to the end of the user input. See [Nemotron-Cascade-8B](https://huggingface.co/nvidia/Nemotron-Cascade-8B) for additional details.


	## Release Date
	Dec 19, 2025


	## License
	Your use of this model is governed by the [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).


	## Citation
	```
	@article{Nemotron_Cascade_Scaling_Cascaded_Reinforcement_Learning,
	title={Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models},
	author={Wang, Boxin and Lee, Chankyu and Lee, Nayeon and Lin, Sheng-Chieh and Dai, Wenliang and Chen, Yang and Chen, Yangyi and Yang, Zhuolin and Liu, Zihan and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei},
	year={2025}
	}
	```