mansaripo commited on
Commit
389961d
·
verified ·
1 Parent(s): 812abc2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -78,7 +78,7 @@ OPT-175B baselines from the [BigScience evaluation repository](https://github.co
78
  |---|---|---:|---:|
79
  | Wikitext | bits per byte ↓ | 0.723 | — |
80
  | LAMBADA (OpenAI) | acc ↑ | 61.1 | **76.2** |
81
- | NQ-Open | exact match ↑ | 7.8 | **14.6** |
82
 
83
  ### MMLU (590k checkpoint)
84
 
@@ -180,7 +180,7 @@ The model uses 264 weight tensors totaling ~4.14 B parameters.
180
  - **English only**: The TokenMonster vocabulary and ClimbMix training data are English-centric.
181
  - **No instruction tuning**: This is a base pretrained model, not fine-tuned for instruction following or chat.
182
  - **Contamination risk**: ClimbMix optimizes mixture weights against benchmark scores, and the upstream datasets (Nemotron-CC, SmolLM-Corpus) do not investigate benchmark contamination. Strong results should be interpreted with caution.
183
- - **Generative benchmarks**: The model is notably weaker on open-ended generation tasks (LAMBADA, NQ-Open) compared to the 175B baselines, reflecting the scale gap on tasks that require deeper knowledge recall.
184
 
185
  ## Citation
186
 
 
78
  |---|---|---:|---:|
79
  | Wikitext | bits per byte ↓ | 0.723 | — |
80
  | LAMBADA (OpenAI) | acc ↑ | 61.1 | **76.2** |
81
+ | NQ | exact match ↑ | 7.8 | **14.6** |
82
 
83
  ### MMLU (590k checkpoint)
84
 
 
180
  - **English only**: The TokenMonster vocabulary and ClimbMix training data are English-centric.
181
  - **No instruction tuning**: This is a base pretrained model, not fine-tuned for instruction following or chat.
182
  - **Contamination risk**: ClimbMix optimizes mixture weights against benchmark scores, and the upstream datasets (Nemotron-CC, SmolLM-Corpus) do not investigate benchmark contamination. Strong results should be interpreted with caution.
183
+ - **Generative benchmarks**: The model is notably weaker on open-ended generation tasks (LAMBADA, NQ) compared to the 175B baselines, reflecting the scale gap on tasks that require deeper knowledge recall.
184
 
185
  ## Citation
186