codelion commited on
Commit
4785466
·
verified ·
1 Parent(s): c413b55

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +8 -8
README.md CHANGED
@@ -192,18 +192,18 @@ All benchmarks evaluated using [lm-evaluation-harness](https://github.com/Eleuth
192
 
193
  ### Comparison with 1B Token Baselines (SmolLM2-70M)
194
 
195
- These are results from training the same SmolLM2-70M model on various 1B-token datasets for 1 epoch, showing that Sutra-10B at 3 epochs achieves the highest performance for this model size.
196
 
197
  | Dataset (1B tokens) | HellaSwag | PIQA | WinoGrande | ARC-C | MMLU | TruthfulQA | GSM8K | Avg |
198
  |---------------------|-----------|------|------------|-------|------|------------|-------|-----|
199
  | **Sutra-10B (3 epochs)** | **26.14** | **54.84** | **50.04** | **22.35** | 22.96 | **48.02** | 0.53 | **34.27** |
200
- | Sutra-1B | 25.43 | 53.86 | 49.41 | 23.04 | 22.91 | 49.09 | 1.14 | 32.13 |
201
- | FineWiki-1B | 25.56 | 51.69 | 48.86 | 24.15 | **23.34** | 51.16 | 0.91 | 32.24 |
202
- | FinePDFs-1B | 25.58 | 52.56 | 50.51 | 22.44 | 22.95 | 51.41 | 1.21 | 32.38 |
203
- | DCLM-Baseline-1B | 25.85 | 55.17 | 50.20 | 21.08 | 22.97 | 49.21 | 0.68 | 32.16 |
204
- | FineWeb-Edu-1B | 25.72 | 55.11 | 50.36 | 21.25 | 22.96 | 48.11 | 1.21 | 32.10 |
205
- | Essential-Web-1B | 26.02 | 55.44 | 48.30 | 20.99 | 22.95 | 49.59 | 1.29 | 32.08 |
206
- | Synth-1B | 26.63 | 50.98 | 48.78 | 21.93 | 23.24 | 47.10 | 1.29 | 31.42 |
207
 
208
  ## Key Findings
209
 
 
192
 
193
  ### Comparison with 1B Token Baselines (SmolLM2-70M)
194
 
195
+ These are results from training the same SmolLM2-70M model on various 1B-token datasets from the [Pre-training Dataset Samples](https://huggingface.co/collections/codelion/pre-training-dataset-samples-686bd760abf1a43b0ce32829) collection for 1 epoch, showing that Sutra-10B at 3 epochs achieves the highest performance for this model size.
196
 
197
  | Dataset (1B tokens) | HellaSwag | PIQA | WinoGrande | ARC-C | MMLU | TruthfulQA | GSM8K | Avg |
198
  |---------------------|-----------|------|------------|-------|------|------------|-------|-----|
199
  | **Sutra-10B (3 epochs)** | **26.14** | **54.84** | **50.04** | **22.35** | 22.96 | **48.02** | 0.53 | **34.27** |
200
+ | [Sutra-1B](https://huggingface.co/datasets/codelion/sutra-1B) | 25.43 | 53.86 | 49.41 | 23.04 | 22.91 | 49.09 | 1.14 | 32.13 |
201
+ | [FineWiki-1B](https://huggingface.co/datasets/HuggingFaceFW/finewiki) | 25.56 | 51.69 | 48.86 | 24.15 | **23.34** | 51.16 | 0.91 | 32.24 |
202
+ | [FinePDFs-1B](https://huggingface.co/datasets/HuggingFaceFW/FinePDFs) | 25.58 | 52.56 | 50.51 | 22.44 | 22.95 | 51.41 | 1.21 | 32.38 |
203
+ | [DCLM-Baseline-1B](https://huggingface.co/datasets/codelion/dclm-baseline-1B) | 25.85 | 55.17 | 50.20 | 21.08 | 22.97 | 49.21 | 0.68 | 32.16 |
204
+ | [FineWeb-Edu-1B](https://huggingface.co/datasets/codelion/fineweb-edu-1B) | 25.72 | 55.11 | 50.36 | 21.25 | 22.96 | 48.11 | 1.21 | 32.10 |
205
+ | [Essential-Web-1B](https://huggingface.co/datasets/sumukshashidhar-archive/essential-web-v1.0-sample-1B) | 26.02 | 55.44 | 48.30 | 20.99 | 22.95 | 49.59 | 1.29 | 32.08 |
206
+ | [Synth-1B](https://huggingface.co/datasets/codelion/synth-1B) | 26.63 | 50.98 | 48.78 | 21.93 | 23.24 | 47.10 | 1.29 | 31.42 |
207
 
208
  ## Key Findings
209