Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -77,6 +77,25 @@ Notably, RLHF for alignment, when used as a pre-step, boosts the model’s compl
 | ***Tool Calling*** |
 | BFCL V3 | 70.4 | 67.9 | 68.6 | 67.5 |
 ## Evaluation Tookit
 To reproduce our results, please check evaluation code, scripts, cached prediction files in https://huggingface.co/nvidia/Nemotron-Cascade-14B-Thinking/blob/main/evaluation/README.md

 | ***Tool Calling*** |
 | BFCL V3 | 70.4 | 67.9 | 68.6 | 67.5 |
+## Usage Recommendations
+We recommend using RoPE scaling with the [YaRN](https://arxiv.org/abs/2309.00071) method to better support long-context inputs. This can be enabled by updating the model’s `config.json` as shown below:
+```json
+  {
+    ...,
+    "rope_scaling": {
+        "rope_type": "yarn",
+        "factor": 2.0,
+        "original_max_position_embeddings": 32768
+    }
+  }
+```
+- **Nemotron-Cascade-14B-Thinking**: use `factor: 3.0` to extend the context length to 90K tokens for SWE Verified (Agentless), and `factor: 2.0` to extend the context length to 64K tokens for other benchmarks.
+- **Nemotron-Cascade-8B** and **Nemotron-Cascade-8B-Thinking**: use `factor: 2.0` across all benchmarks.
 ## Evaluation Tookit
 To reproduce our results, please check evaluation code, scripts, cached prediction files in https://huggingface.co/nvidia/Nemotron-Cascade-14B-Thinking/blob/main/evaluation/README.md