Update README.md
Browse files
README.md
CHANGED
|
@@ -80,7 +80,7 @@ Notably, RLHF for alignment, when used as a pre-step, boosts the model’s compl
|
|
| 80 |
|
| 81 |
## Usage Recommendations
|
| 82 |
|
| 83 |
-
We recommend using RoPE scaling with the [YaRN](https://arxiv.org/abs/2309.00071) method to better support long-context inputs. This can be enabled by updating the model’s config.json as shown below:
|
| 84 |
```json
|
| 85 |
{
|
| 86 |
...,
|
|
@@ -92,8 +92,8 @@ We recommend using RoPE scaling with the [YaRN](https://arxiv.org/abs/2309.00071
|
|
| 92 |
}
|
| 93 |
```
|
| 94 |
|
| 95 |
-
- **Nemotron-Cascade-14B-Thinking**: use
|
| 96 |
-
- **Nemotron-Cascade-8B** and **Nemotron-Cascade-8B-Thinking**: use
|
| 97 |
|
| 98 |
|
| 99 |
## Evaluation Tookit
|
|
|
|
| 80 |
|
| 81 |
## Usage Recommendations
|
| 82 |
|
| 83 |
+
We recommend using RoPE scaling with the [YaRN](https://arxiv.org/abs/2309.00071) method to better support long-context inputs. This can be enabled by updating the model’s `config.json` as shown below:
|
| 84 |
```json
|
| 85 |
{
|
| 86 |
...,
|
|
|
|
| 92 |
}
|
| 93 |
```
|
| 94 |
|
| 95 |
+
- **Nemotron-Cascade-14B-Thinking**: use `factor: 3.0` to extend the context length to 90K tokens for SWE Verified (Agentless), and `factor: 2.0` to extend the context length to 64K tokens for other benchmarks.
|
| 96 |
+
- **Nemotron-Cascade-8B** and **Nemotron-Cascade-8B-Thinking**: use `factor: 2.0` across all benchmarks.
|
| 97 |
|
| 98 |
|
| 99 |
## Evaluation Tookit
|