Update README.md
Browse files
README.md
CHANGED
|
@@ -37,10 +37,10 @@ Each folder contains an OpenLM PyTorch checkpoint (`epoch_11.pt`, final epoch) p
|
|
| 37 |
|
| 38 |
| Folder | Method | Training mixture | DCLM CORE v2 avg. |
|
| 39 |
|--------|--------|------------------|-------------------|
|
| 40 |
-
| `random_selection` | Random baseline | Uniform sampling from Corpus-200B pool |
|
| 41 |
-
| `dclm_fasttext_only` | Quality (DCLM-fasttext) | Documents above DCLM-fasttext quality threshold |
|
| 42 |
| `betweenness_alpha0.5` | **WebGraphMix** | 50/50 mix of top/bottom betweenness-centrality hosts | 41.4% |
|
| 43 |
-
| `betweenness_alpha0.5_mult_div_dclm_fasttext` | **WebGraphMix+** | Betweenness 50/50 mix × DCLM-fasttext quality filter | 43.
|
| 44 |
|
| 45 |
> Scores are `aggregated_results` from the `mmlu_and_lowvar` eval suite (23 low-variance ICL tasks). See the [WebGraphMix repo](https://github.com/princeton-pli/WebGraphMix) to reproduce evaluation.
|
| 46 |
|
|
|
|
| 37 |
|
| 38 |
| Folder | Method | Training mixture | DCLM CORE v2 avg. |
|
| 39 |
|--------|--------|------------------|-------------------|
|
| 40 |
+
| `random_selection` | Random baseline | Uniform sampling from Corpus-200B pool | 39.8% |
|
| 41 |
+
| `dclm_fasttext_only` | Quality (DCLM-fasttext) | Documents above DCLM-fasttext quality threshold | 42.3% |
|
| 42 |
| `betweenness_alpha0.5` | **WebGraphMix** | 50/50 mix of top/bottom betweenness-centrality hosts | 41.4% |
|
| 43 |
+
| `betweenness_alpha0.5_mult_div_dclm_fasttext` | **WebGraphMix+** | Betweenness 50/50 mix × DCLM-fasttext quality filter | 43.8% |
|
| 44 |
|
| 45 |
> Scores are `aggregated_results` from the `mmlu_and_lowvar` eval suite (23 low-variance ICL tasks). See the [WebGraphMix repo](https://github.com/princeton-pli/WebGraphMix) to reproduce evaluation.
|
| 46 |
|