Update README.md
Browse files
README.md
CHANGED
|
@@ -77,6 +77,25 @@ Notably, RLHF for alignment, when used as a pre-step, boosts the model’s compl
|
|
| 77 |
| ***Tool Calling*** |
|
| 78 |
| BFCL V3 | 70.4 | 67.9 | 68.6 | 67.5 |
|
| 79 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 80 |
## Evaluation Tookit
|
| 81 |
|
| 82 |
To reproduce our results, please check evaluation code, scripts, cached prediction files in https://huggingface.co/nvidia/Nemotron-Cascade-14B-Thinking/blob/main/evaluation/README.md
|
|
|
|
| 77 |
| ***Tool Calling*** |
|
| 78 |
| BFCL V3 | 70.4 | 67.9 | 68.6 | 67.5 |
|
| 79 |
|
| 80 |
+
|
| 81 |
+
## Usage Recommendations
|
| 82 |
+
|
| 83 |
+
We recommend using RoPE scaling with the [YaRN](https://arxiv.org/abs/2309.00071) method to better support long-context inputs. This can be enabled by updating the model’s `config.json` as shown below:
|
| 84 |
+
```json
|
| 85 |
+
{
|
| 86 |
+
...,
|
| 87 |
+
"rope_scaling": {
|
| 88 |
+
"rope_type": "yarn",
|
| 89 |
+
"factor": 2.0,
|
| 90 |
+
"original_max_position_embeddings": 32768
|
| 91 |
+
}
|
| 92 |
+
}
|
| 93 |
+
```
|
| 94 |
+
|
| 95 |
+
- **Nemotron-Cascade-14B-Thinking**: use `factor: 3.0` to extend the context length to 90K tokens for SWE Verified (Agentless), and `factor: 2.0` to extend the context length to 64K tokens for other benchmarks.
|
| 96 |
+
- **Nemotron-Cascade-8B** and **Nemotron-Cascade-8B-Thinking**: use `factor: 2.0` across all benchmarks.
|
| 97 |
+
|
| 98 |
+
|
| 99 |
## Evaluation Tookit
|
| 100 |
|
| 101 |
To reproduce our results, please check evaluation code, scripts, cached prediction files in https://huggingface.co/nvidia/Nemotron-Cascade-14B-Thinking/blob/main/evaluation/README.md
|