| --- |
| library_name: transformers |
| license: other |
| license_name: nvidia-open-model-license |
| license_link: >- |
| https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/ |
| pipeline_tag: text-generation |
| language: |
| - en |
| tags: |
| - nvidia |
| - Nemotron-Cascade |
| - reasoning |
| - general-purpose |
| - SFT |
| - RL |
| - pytorch |
| --- |
| |
| # Nemotron-Cascade-8B Intermediate ckpts |
|
|
| <p align="center"> |
|
|
| [](https://arxiv.org/abs/2512.13607) |
| [](https://huggingface.co/collections/nvidia/nemotron-cascade) |
| [](https://huggingface.co/collections/nvidia/nemotron-cascade) |
| [](https://huggingface.co/collections/nvidia/nemotron-cascade) |
| </p> |
|
|
|
|
| ## Introduction |
|
|
| This repository releases the intermediate checkpoints produced during the development of [Nemotron-Cascade-8B](https://huggingface.co/nvidia/Nemotron-Cascade-8B). Nemotron-Cascade-8B is a general-purpose model trained using a sequential, domain-wise reinforcement learning pipeline, illustrated in the figure below. |
|
|
| <img src="fig/pipeline.png" alt="train_pipeline_fig" style="width: 1000px; max-width: 100%;" /> |
|
|
| We release checkpoints corresponding to each major stage of training: |
|
|
| - **Nemotron-Cascade-8B-SFT** (completed multi-stage SFT) |
| - **Nemotron-Cascade-8B-RLHF** (completed RLHF) |
| - **Nemotron-Cascade-8B-IFRL** (completed instruction following RL) |
| - **Nemotron-Cascade-8B-MathRL** (completed Math RL) |
| - **Nemotron-Cascade-8B-CodeRL** (completed Code RL) |
|
|
| The final model, [Nemotron-Cascade-8B](https://huggingface.co/nvidia/Nemotron-Cascade-8B), is obtained after the concluding SWE RL stage. |
|
|
|
|
| ## Usage Recommendations |
|
|
| We recommend using RoPE scaling with the [YaRN](https://arxiv.org/abs/2309.00071) method to better support contexts longer than 32K. This can be enabled by updating the model’s `config.json` as shown below: |
| ```json |
| { |
| ..., |
| "rope_scaling": { |
| "rope_type": "yarn", |
| "factor": 2.0, |
| "original_max_position_embeddings": 32768 |
| } |
| } |
| ``` |
|
|
|
|
| ## Results |
|
|
| Same as [Nemotron-Cascade-8B](https://huggingface.co/nvidia/Nemotron-Cascade-8B), we use a maximum output length of 64K tokens for evaluation, with the temperature set to 0.6 and top-p to 0.95. We also apply RoPE scaling using the YaRN method with a scaling factor of 2.0. |
|
|
| | **Benchmark<br>Metric: Pass@1** | **Nemotron-<br>Cascade-8B-SFT** | **Nemotron-<br>Cascade-8B-RLHF** | **Nemotron-<br>Cascade-8B-IFRL** | **Nemotron-<br>Cascade-8B-MathRL** | **Nemotron-<br>Cascade-8B-CodeRL** | **Nemotron-<br>Cascade-8B** | |
| | :---- | :---: | :---: | :---: | :---: | :---: | :---: | |
| | ***Knowledge Reasoning*** | |
| | MMLU | 83.0 | 83.1 | 83.4 | 83.4 | 83.7 | 83.7 | |
| | MMLU Pro | 74.4 | 77.8 | 74.5 | 75.0 | 75.3 | 75.7 | |
| | GPQA-Diamond | 63.5 | 66.8 | 66.1 | 65.7 | 67.4 | 66.5 | |
| | ***Alignment*** | |
| | ArenaHard | 70.0 | 90.1 | 88.0 | 87.0 | 87.8 | 87.9 | |
| | IFEval (Strict Prompt) | 70.8 | 50.1 | 90.4 | 92.1 | 90.7 | 90.2 | |
| | IFBench | 21.2 | 24.5 | 40.5 | 40.4 | 38.1 | 40.8 | |
| | ***Math*** | |
| | AIME 2024 | 83.6 | 86.1 | 86.2 | 90.2 | 89.1 | 89.5 | |
| | AIME 2025 | 72.8 | 75.0 | 75.2 | 81.9 | 80.5 | 80.1 | |
| | ***Code*** | |
| | LCB v5 (08/24-02/25) | 59.2 | 70.2 | 70.2 | 70.6 | 75.3 | 74.3 | |
| | LCB v6 (08/24-05/25) | 56.7 | 67.2 | 66.7 | 67.4 | 71.5 | 71.1 | |
| | SWE Verified (Agentless) | 26.1 | 28.2 | 28.3 | 30.6 | 31.6 | 37.2 | |
|
|
|
|
| ## Chat Template |
|
|
| All intermediate checkpoints use the same chat template as [Nemotron-Cascade-8B](https://huggingface.co/nvidia/Nemotron-Cascade-8B). Each is a unified model supporting both ***thinking*** and ***instruct*** (non-reasoning) modes. To switch between these two modes, simply append the `" /think"` (for ***thinking***) or the `" /no_think"` (for ***instruct***) tag to the end of the user input. See [Nemotron-Cascade-8B](https://huggingface.co/nvidia/Nemotron-Cascade-8B) for additional details. |
|
|
|
|
| ## Release Date |
| Dec 19, 2025 |
|
|
|
|
| ## License |
| Your use of this model is governed by the [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/). |
|
|
|
|
| ## Citation |
| ``` |
| @article{Nemotron_Cascade_Scaling_Cascaded_Reinforcement_Learning, |
| title={Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models}, |
| author={Wang, Boxin and Lee, Chankyu and Lee, Nayeon and Lin, Sheng-Chieh and Dai, Wenliang and Chen, Yang and Chen, Yangyi and Yang, Zhuolin and Liu, Zihan and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei}, |
| year={2025} |
| } |
| ``` |
|
|
|
|