Instructions to use Interplay-LM-Reasoning/extrapolation_midtrain with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Interplay-LM-Reasoning/extrapolation_midtrain with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Interplay-LM-Reasoning/extrapolation_midtrain", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Add files using upload-large-folder tool
Browse files- README.md +77 -53
- id2-10_0.2easy_0.3medium_0.5hard/README.md +28 -0
- id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-10062/README.md +28 -0
- id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-11997/README.md +28 -0
- id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-12771/README.md +28 -0
- id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-1548/README.md +28 -0
- id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-15867/README.md +28 -0
- id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-16254/README.md +28 -0
- id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-18963/README.md +28 -0
- id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-1935/README.md +28 -0
- id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-19350/README.md +28 -0
- id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-25542/README.md +28 -0
- id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-3096/README.md +28 -0
- id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-387/README.md +28 -0
- id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-3870/README.md +28 -0
- id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-4644/README.md +28 -0
- id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-6579/README.md +28 -0
- id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-774/README.md +28 -0
- id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-7740/README.md +28 -0
- id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-8127/README.md +28 -0
- id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-100step-0.8RL/README.md +28 -0
- id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-200step-0.8RL/README.md +28 -0
- id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-50step-0.8RL/README.md +28 -0
- id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.5-rl-op11-14_uniform-100step-0.5RL/README.md +28 -0
- id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.8-rl-op11-14_uniform-100step-0.2RL/README.md +28 -0
README.md
CHANGED
|
@@ -1,65 +1,89 @@
|
|
| 1 |
---
|
| 2 |
-
license:
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
---
|
| 5 |
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 54 |
|
| 55 |
```bibtex
|
| 56 |
@misc{zhang2025interplaypretrainingmidtrainingrl,
|
| 57 |
-
title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
|
| 58 |
author={Charlie Zhang and Graham Neubig and Xiang Yue},
|
| 59 |
year={2025},
|
| 60 |
eprint={2512.07783},
|
| 61 |
archivePrefix={arXiv},
|
| 62 |
primaryClass={cs.CL},
|
| 63 |
-
url={https://arxiv.org/abs/2512.07783},
|
| 64 |
}
|
| 65 |
-
```
|
|
|
|
| 1 |
---
|
| 2 |
+
license: other
|
| 3 |
+
library_name: transformers
|
| 4 |
+
tags:
|
| 5 |
+
- reasoning
|
| 6 |
+
- mid-training
|
| 7 |
+
- extrapolation
|
| 8 |
+
- synthetic-data
|
| 9 |
+
- transformers
|
| 10 |
---
|
| 11 |
|
| 12 |
+
# Interplay-LM Extrapolation Mid-Train Models
|
| 13 |
+
|
| 14 |
+
This repository contains the `op11-14` CPT checkpoints and corresponding local RL outputs used by `scripts/composition/op-difficulty-10B/script_cpt_rl/id2-10_0.2easy_0.3medium_0.5hard_cpt11-14`.
|
| 15 |
+
|
| 16 |
+
For pretraining, only `cpt0.2-uniform_0.8-11-14_plus` is included. For RL, only final `actor/huggingface` checkpoints found locally are uploaded.
|
| 17 |
+
|
| 18 |
+
## CPT Checkpoints
|
| 19 |
+
|
| 20 |
+
| Path | Checkpoint | Used by nominal step / CPT epoch |
|
| 21 |
+
| --- | --- | --- |
|
| 22 |
+
| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-387` | checkpoint-387 | 50step/0.2 |
|
| 23 |
+
| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-774` | checkpoint-774 | 100step/0.2 |
|
| 24 |
+
| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-1548` | checkpoint-1548 | 200step/0.2 |
|
| 25 |
+
| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-1935` | checkpoint-1935 | 100step/0.5 |
|
| 26 |
+
| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-3096` | checkpoint-3096 | 100step/0.8, 400step/0.2 |
|
| 27 |
+
| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-3870` | checkpoint-3870 | 500step/0.2 |
|
| 28 |
+
| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-4644` | checkpoint-4644 | 600step/0.2 |
|
| 29 |
+
| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-6579` | checkpoint-6579 | 800step/0.2 |
|
| 30 |
+
| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-7740` | checkpoint-7740 | 954step/0.2 |
|
| 31 |
+
| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-8127` | checkpoint-8127 | 400step/0.5 |
|
| 32 |
+
| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-10062` | checkpoint-10062 | 500step/0.5 |
|
| 33 |
+
| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-11997` | checkpoint-11997 | 600step/0.5 |
|
| 34 |
+
| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-12771` | checkpoint-12771 | 400step/0.8 |
|
| 35 |
+
| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-15867` | checkpoint-15867 | 800step/0.5 |
|
| 36 |
+
| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-16254` | checkpoint-16254 | 500step/0.8 |
|
| 37 |
+
| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-18963` | checkpoint-18963 | 954step/0.5 |
|
| 38 |
+
| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-19350` | checkpoint-19350 | 600step/0.8 |
|
| 39 |
+
| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-25542` | checkpoint-25542 | 800step/0.8 |
|
| 40 |
+
|
| 41 |
+
## RL Checkpoints
|
| 42 |
+
|
| 43 |
+
| Path | Nominal step | CPT epoch | Source CPT checkpoint | Uploaded checkpoint |
|
| 44 |
+
| --- | --- | --- | --- | --- |
|
| 45 |
+
| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-50step-0.8RL` | 50 | 0.2 | checkpoint-387 | `global_step_40` |
|
| 46 |
+
| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.8-rl-op11-14_uniform-100step-0.2RL` | 100 | 0.8 | checkpoint-3096 | `global_step_19` |
|
| 47 |
+
| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.5-rl-op11-14_uniform-100step-0.5RL` | 100 | 0.5 | checkpoint-1935 | `global_step_50` |
|
| 48 |
+
| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-100step-0.8RL` | 100 | 0.2 | checkpoint-774 | `global_step_80` |
|
| 49 |
+
| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-200step-0.8RL` | 200 | 0.2 | checkpoint-1548 | `global_step_160` |
|
| 50 |
+
| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.8-rl-op11-14_uniform-400step-0.2RL` | 400 | 0.8 | checkpoint-12771 | not found locally |
|
| 51 |
+
| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.5-rl-op11-14_uniform-400step-0.5RL` | 400 | 0.5 | checkpoint-8127 | not found locally |
|
| 52 |
+
| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-400step-0.8RL` | 400 | 0.2 | checkpoint-3096 | not found locally |
|
| 53 |
+
| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.8-rl-op11-14_uniform-500step-0.2RL` | 500 | 0.8 | checkpoint-16254 | not found locally |
|
| 54 |
+
| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.5-rl-op11-14_uniform-500step-0.5RL` | 500 | 0.5 | checkpoint-10062 | not found locally |
|
| 55 |
+
| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-500step-0.8RL` | 500 | 0.2 | checkpoint-3870 | not found locally |
|
| 56 |
+
| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.8-rl-op11-14_uniform-600step-0.2RL` | 600 | 0.8 | checkpoint-19350 | not found locally |
|
| 57 |
+
| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.5-rl-op11-14_uniform-600step-0.5RL` | 600 | 0.5 | checkpoint-11997 | not found locally |
|
| 58 |
+
| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-600step-0.8RL` | 600 | 0.2 | checkpoint-4644 | not found locally |
|
| 59 |
+
| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.8-rl-op11-14_uniform-800step-0.2RL` | 800 | 0.8 | checkpoint-25542 | not found locally |
|
| 60 |
+
| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.5-rl-op11-14_uniform-800step-0.5RL` | 800 | 0.5 | checkpoint-15867 | not found locally |
|
| 61 |
+
| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-800step-0.8RL` | 800 | 0.2 | checkpoint-6579 | not found locally |
|
| 62 |
+
| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.5-rl-op11-14_uniform-954step-0.5RL` | 954 | 0.5 | checkpoint-18963 | not found locally |
|
| 63 |
+
| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-954step-0.8RL` | 954 | 0.2 | checkpoint-7740 | not found locally |
|
| 64 |
+
|
| 65 |
+
## Load
|
| 66 |
+
|
| 67 |
+
```python
|
| 68 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 69 |
+
|
| 70 |
+
repo_id = "Interplay-LM-Reasoning/extrapolation_midtrain"
|
| 71 |
+
subdir = "id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-25542"
|
| 72 |
+
|
| 73 |
+
tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder=subdir)
|
| 74 |
+
model = AutoModelForCausalLM.from_pretrained(repo_id, subfolder=subdir)
|
| 75 |
+
```
|
| 76 |
+
|
| 77 |
+
## Citation
|
| 78 |
|
| 79 |
```bibtex
|
| 80 |
@misc{zhang2025interplaypretrainingmidtrainingrl,
|
| 81 |
+
title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
|
| 82 |
author={Charlie Zhang and Graham Neubig and Xiang Yue},
|
| 83 |
year={2025},
|
| 84 |
eprint={2512.07783},
|
| 85 |
archivePrefix={arXiv},
|
| 86 |
primaryClass={cs.CL},
|
| 87 |
+
url={https://arxiv.org/abs/2512.07783},
|
| 88 |
}
|
| 89 |
+
```
|
id2-10_0.2easy_0.3medium_0.5hard/README.md
ADDED
|
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: other
|
| 3 |
+
library_name: transformers
|
| 4 |
+
tags:
|
| 5 |
+
- reasoning
|
| 6 |
+
- mid-training
|
| 7 |
+
- extrapolation
|
| 8 |
+
- synthetic-data
|
| 9 |
+
- transformers
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
# id2-10_0.2easy_0.3medium_0.5hard
|
| 13 |
+
|
| 14 |
+
`op11-14` extrapolation mid-training artifacts for `cpt0.2-uniform_0.8-11-14_plus`.
|
| 15 |
+
|
| 16 |
+
## Citation
|
| 17 |
+
|
| 18 |
+
```bibtex
|
| 19 |
+
@misc{zhang2025interplaypretrainingmidtrainingrl,
|
| 20 |
+
title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
|
| 21 |
+
author={Charlie Zhang and Graham Neubig and Xiang Yue},
|
| 22 |
+
year={2025},
|
| 23 |
+
eprint={2512.07783},
|
| 24 |
+
archivePrefix={arXiv},
|
| 25 |
+
primaryClass={cs.CL},
|
| 26 |
+
url={https://arxiv.org/abs/2512.07783},
|
| 27 |
+
}
|
| 28 |
+
```
|
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-10062/README.md
ADDED
|
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: other
|
| 3 |
+
library_name: transformers
|
| 4 |
+
tags:
|
| 5 |
+
- reasoning
|
| 6 |
+
- mid-training
|
| 7 |
+
- extrapolation
|
| 8 |
+
- synthetic-data
|
| 9 |
+
- transformers
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
# id2-10_0.2easy_0.3medium_0.5hard / midtrain / cpt0.2-uniform_0.8-11-14_plus / checkpoint-10062
|
| 13 |
+
|
| 14 |
+
CPT checkpoint `checkpoint-10062` from `cpt0.2-uniform_0.8-11-14_plus`, used by the op11-14 CPT-to-RL budget scripts.
|
| 15 |
+
|
| 16 |
+
## Citation
|
| 17 |
+
|
| 18 |
+
```bibtex
|
| 19 |
+
@misc{zhang2025interplaypretrainingmidtrainingrl,
|
| 20 |
+
title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
|
| 21 |
+
author={Charlie Zhang and Graham Neubig and Xiang Yue},
|
| 22 |
+
year={2025},
|
| 23 |
+
eprint={2512.07783},
|
| 24 |
+
archivePrefix={arXiv},
|
| 25 |
+
primaryClass={cs.CL},
|
| 26 |
+
url={https://arxiv.org/abs/2512.07783},
|
| 27 |
+
}
|
| 28 |
+
```
|
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-11997/README.md
ADDED
|
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: other
|
| 3 |
+
library_name: transformers
|
| 4 |
+
tags:
|
| 5 |
+
- reasoning
|
| 6 |
+
- mid-training
|
| 7 |
+
- extrapolation
|
| 8 |
+
- synthetic-data
|
| 9 |
+
- transformers
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
# id2-10_0.2easy_0.3medium_0.5hard / midtrain / cpt0.2-uniform_0.8-11-14_plus / checkpoint-11997
|
| 13 |
+
|
| 14 |
+
CPT checkpoint `checkpoint-11997` from `cpt0.2-uniform_0.8-11-14_plus`, used by the op11-14 CPT-to-RL budget scripts.
|
| 15 |
+
|
| 16 |
+
## Citation
|
| 17 |
+
|
| 18 |
+
```bibtex
|
| 19 |
+
@misc{zhang2025interplaypretrainingmidtrainingrl,
|
| 20 |
+
title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
|
| 21 |
+
author={Charlie Zhang and Graham Neubig and Xiang Yue},
|
| 22 |
+
year={2025},
|
| 23 |
+
eprint={2512.07783},
|
| 24 |
+
archivePrefix={arXiv},
|
| 25 |
+
primaryClass={cs.CL},
|
| 26 |
+
url={https://arxiv.org/abs/2512.07783},
|
| 27 |
+
}
|
| 28 |
+
```
|
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-12771/README.md
ADDED
|
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: other
|
| 3 |
+
library_name: transformers
|
| 4 |
+
tags:
|
| 5 |
+
- reasoning
|
| 6 |
+
- mid-training
|
| 7 |
+
- extrapolation
|
| 8 |
+
- synthetic-data
|
| 9 |
+
- transformers
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
# id2-10_0.2easy_0.3medium_0.5hard / midtrain / cpt0.2-uniform_0.8-11-14_plus / checkpoint-12771
|
| 13 |
+
|
| 14 |
+
CPT checkpoint `checkpoint-12771` from `cpt0.2-uniform_0.8-11-14_plus`, used by the op11-14 CPT-to-RL budget scripts.
|
| 15 |
+
|
| 16 |
+
## Citation
|
| 17 |
+
|
| 18 |
+
```bibtex
|
| 19 |
+
@misc{zhang2025interplaypretrainingmidtrainingrl,
|
| 20 |
+
title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
|
| 21 |
+
author={Charlie Zhang and Graham Neubig and Xiang Yue},
|
| 22 |
+
year={2025},
|
| 23 |
+
eprint={2512.07783},
|
| 24 |
+
archivePrefix={arXiv},
|
| 25 |
+
primaryClass={cs.CL},
|
| 26 |
+
url={https://arxiv.org/abs/2512.07783},
|
| 27 |
+
}
|
| 28 |
+
```
|
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-1548/README.md
ADDED
|
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: other
|
| 3 |
+
library_name: transformers
|
| 4 |
+
tags:
|
| 5 |
+
- reasoning
|
| 6 |
+
- mid-training
|
| 7 |
+
- extrapolation
|
| 8 |
+
- synthetic-data
|
| 9 |
+
- transformers
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
# id2-10_0.2easy_0.3medium_0.5hard / midtrain / cpt0.2-uniform_0.8-11-14_plus / checkpoint-1548
|
| 13 |
+
|
| 14 |
+
CPT checkpoint `checkpoint-1548` from `cpt0.2-uniform_0.8-11-14_plus`, used by the op11-14 CPT-to-RL budget scripts.
|
| 15 |
+
|
| 16 |
+
## Citation
|
| 17 |
+
|
| 18 |
+
```bibtex
|
| 19 |
+
@misc{zhang2025interplaypretrainingmidtrainingrl,
|
| 20 |
+
title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
|
| 21 |
+
author={Charlie Zhang and Graham Neubig and Xiang Yue},
|
| 22 |
+
year={2025},
|
| 23 |
+
eprint={2512.07783},
|
| 24 |
+
archivePrefix={arXiv},
|
| 25 |
+
primaryClass={cs.CL},
|
| 26 |
+
url={https://arxiv.org/abs/2512.07783},
|
| 27 |
+
}
|
| 28 |
+
```
|
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-15867/README.md
ADDED
|
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: other
|
| 3 |
+
library_name: transformers
|
| 4 |
+
tags:
|
| 5 |
+
- reasoning
|
| 6 |
+
- mid-training
|
| 7 |
+
- extrapolation
|
| 8 |
+
- synthetic-data
|
| 9 |
+
- transformers
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
# id2-10_0.2easy_0.3medium_0.5hard / midtrain / cpt0.2-uniform_0.8-11-14_plus / checkpoint-15867
|
| 13 |
+
|
| 14 |
+
CPT checkpoint `checkpoint-15867` from `cpt0.2-uniform_0.8-11-14_plus`, used by the op11-14 CPT-to-RL budget scripts.
|
| 15 |
+
|
| 16 |
+
## Citation
|
| 17 |
+
|
| 18 |
+
```bibtex
|
| 19 |
+
@misc{zhang2025interplaypretrainingmidtrainingrl,
|
| 20 |
+
title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
|
| 21 |
+
author={Charlie Zhang and Graham Neubig and Xiang Yue},
|
| 22 |
+
year={2025},
|
| 23 |
+
eprint={2512.07783},
|
| 24 |
+
archivePrefix={arXiv},
|
| 25 |
+
primaryClass={cs.CL},
|
| 26 |
+
url={https://arxiv.org/abs/2512.07783},
|
| 27 |
+
}
|
| 28 |
+
```
|
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-16254/README.md
ADDED
|
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: other
|
| 3 |
+
library_name: transformers
|
| 4 |
+
tags:
|
| 5 |
+
- reasoning
|
| 6 |
+
- mid-training
|
| 7 |
+
- extrapolation
|
| 8 |
+
- synthetic-data
|
| 9 |
+
- transformers
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
# id2-10_0.2easy_0.3medium_0.5hard / midtrain / cpt0.2-uniform_0.8-11-14_plus / checkpoint-16254
|
| 13 |
+
|
| 14 |
+
CPT checkpoint `checkpoint-16254` from `cpt0.2-uniform_0.8-11-14_plus`, used by the op11-14 CPT-to-RL budget scripts.
|
| 15 |
+
|
| 16 |
+
## Citation
|
| 17 |
+
|
| 18 |
+
```bibtex
|
| 19 |
+
@misc{zhang2025interplaypretrainingmidtrainingrl,
|
| 20 |
+
title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
|
| 21 |
+
author={Charlie Zhang and Graham Neubig and Xiang Yue},
|
| 22 |
+
year={2025},
|
| 23 |
+
eprint={2512.07783},
|
| 24 |
+
archivePrefix={arXiv},
|
| 25 |
+
primaryClass={cs.CL},
|
| 26 |
+
url={https://arxiv.org/abs/2512.07783},
|
| 27 |
+
}
|
| 28 |
+
```
|
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-18963/README.md
ADDED
|
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: other
|
| 3 |
+
library_name: transformers
|
| 4 |
+
tags:
|
| 5 |
+
- reasoning
|
| 6 |
+
- mid-training
|
| 7 |
+
- extrapolation
|
| 8 |
+
- synthetic-data
|
| 9 |
+
- transformers
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
# id2-10_0.2easy_0.3medium_0.5hard / midtrain / cpt0.2-uniform_0.8-11-14_plus / checkpoint-18963
|
| 13 |
+
|
| 14 |
+
CPT checkpoint `checkpoint-18963` from `cpt0.2-uniform_0.8-11-14_plus`, used by the op11-14 CPT-to-RL budget scripts.
|
| 15 |
+
|
| 16 |
+
## Citation
|
| 17 |
+
|
| 18 |
+
```bibtex
|
| 19 |
+
@misc{zhang2025interplaypretrainingmidtrainingrl,
|
| 20 |
+
title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
|
| 21 |
+
author={Charlie Zhang and Graham Neubig and Xiang Yue},
|
| 22 |
+
year={2025},
|
| 23 |
+
eprint={2512.07783},
|
| 24 |
+
archivePrefix={arXiv},
|
| 25 |
+
primaryClass={cs.CL},
|
| 26 |
+
url={https://arxiv.org/abs/2512.07783},
|
| 27 |
+
}
|
| 28 |
+
```
|
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-1935/README.md
ADDED
|
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: other
|
| 3 |
+
library_name: transformers
|
| 4 |
+
tags:
|
| 5 |
+
- reasoning
|
| 6 |
+
- mid-training
|
| 7 |
+
- extrapolation
|
| 8 |
+
- synthetic-data
|
| 9 |
+
- transformers
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
# id2-10_0.2easy_0.3medium_0.5hard / midtrain / cpt0.2-uniform_0.8-11-14_plus / checkpoint-1935
|
| 13 |
+
|
| 14 |
+
CPT checkpoint `checkpoint-1935` from `cpt0.2-uniform_0.8-11-14_plus`, used by the op11-14 CPT-to-RL budget scripts.
|
| 15 |
+
|
| 16 |
+
## Citation
|
| 17 |
+
|
| 18 |
+
```bibtex
|
| 19 |
+
@misc{zhang2025interplaypretrainingmidtrainingrl,
|
| 20 |
+
title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
|
| 21 |
+
author={Charlie Zhang and Graham Neubig and Xiang Yue},
|
| 22 |
+
year={2025},
|
| 23 |
+
eprint={2512.07783},
|
| 24 |
+
archivePrefix={arXiv},
|
| 25 |
+
primaryClass={cs.CL},
|
| 26 |
+
url={https://arxiv.org/abs/2512.07783},
|
| 27 |
+
}
|
| 28 |
+
```
|
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-19350/README.md
ADDED
|
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: other
|
| 3 |
+
library_name: transformers
|
| 4 |
+
tags:
|
| 5 |
+
- reasoning
|
| 6 |
+
- mid-training
|
| 7 |
+
- extrapolation
|
| 8 |
+
- synthetic-data
|
| 9 |
+
- transformers
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
# id2-10_0.2easy_0.3medium_0.5hard / midtrain / cpt0.2-uniform_0.8-11-14_plus / checkpoint-19350
|
| 13 |
+
|
| 14 |
+
CPT checkpoint `checkpoint-19350` from `cpt0.2-uniform_0.8-11-14_plus`, used by the op11-14 CPT-to-RL budget scripts.
|
| 15 |
+
|
| 16 |
+
## Citation
|
| 17 |
+
|
| 18 |
+
```bibtex
|
| 19 |
+
@misc{zhang2025interplaypretrainingmidtrainingrl,
|
| 20 |
+
title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
|
| 21 |
+
author={Charlie Zhang and Graham Neubig and Xiang Yue},
|
| 22 |
+
year={2025},
|
| 23 |
+
eprint={2512.07783},
|
| 24 |
+
archivePrefix={arXiv},
|
| 25 |
+
primaryClass={cs.CL},
|
| 26 |
+
url={https://arxiv.org/abs/2512.07783},
|
| 27 |
+
}
|
| 28 |
+
```
|
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-25542/README.md
ADDED
|
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: other
|
| 3 |
+
library_name: transformers
|
| 4 |
+
tags:
|
| 5 |
+
- reasoning
|
| 6 |
+
- mid-training
|
| 7 |
+
- extrapolation
|
| 8 |
+
- synthetic-data
|
| 9 |
+
- transformers
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
# id2-10_0.2easy_0.3medium_0.5hard / midtrain / cpt0.2-uniform_0.8-11-14_plus / checkpoint-25542
|
| 13 |
+
|
| 14 |
+
CPT checkpoint `checkpoint-25542` from `cpt0.2-uniform_0.8-11-14_plus`, used by the op11-14 CPT-to-RL budget scripts.
|
| 15 |
+
|
| 16 |
+
## Citation
|
| 17 |
+
|
| 18 |
+
```bibtex
|
| 19 |
+
@misc{zhang2025interplaypretrainingmidtrainingrl,
|
| 20 |
+
title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
|
| 21 |
+
author={Charlie Zhang and Graham Neubig and Xiang Yue},
|
| 22 |
+
year={2025},
|
| 23 |
+
eprint={2512.07783},
|
| 24 |
+
archivePrefix={arXiv},
|
| 25 |
+
primaryClass={cs.CL},
|
| 26 |
+
url={https://arxiv.org/abs/2512.07783},
|
| 27 |
+
}
|
| 28 |
+
```
|
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-3096/README.md
ADDED
|
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: other
|
| 3 |
+
library_name: transformers
|
| 4 |
+
tags:
|
| 5 |
+
- reasoning
|
| 6 |
+
- mid-training
|
| 7 |
+
- extrapolation
|
| 8 |
+
- synthetic-data
|
| 9 |
+
- transformers
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
# id2-10_0.2easy_0.3medium_0.5hard / midtrain / cpt0.2-uniform_0.8-11-14_plus / checkpoint-3096
|
| 13 |
+
|
| 14 |
+
CPT checkpoint `checkpoint-3096` from `cpt0.2-uniform_0.8-11-14_plus`, used by the op11-14 CPT-to-RL budget scripts.
|
| 15 |
+
|
| 16 |
+
## Citation
|
| 17 |
+
|
| 18 |
+
```bibtex
|
| 19 |
+
@misc{zhang2025interplaypretrainingmidtrainingrl,
|
| 20 |
+
title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
|
| 21 |
+
author={Charlie Zhang and Graham Neubig and Xiang Yue},
|
| 22 |
+
year={2025},
|
| 23 |
+
eprint={2512.07783},
|
| 24 |
+
archivePrefix={arXiv},
|
| 25 |
+
primaryClass={cs.CL},
|
| 26 |
+
url={https://arxiv.org/abs/2512.07783},
|
| 27 |
+
}
|
| 28 |
+
```
|
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-387/README.md
ADDED
|
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: other
|
| 3 |
+
library_name: transformers
|
| 4 |
+
tags:
|
| 5 |
+
- reasoning
|
| 6 |
+
- mid-training
|
| 7 |
+
- extrapolation
|
| 8 |
+
- synthetic-data
|
| 9 |
+
- transformers
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
# id2-10_0.2easy_0.3medium_0.5hard / midtrain / cpt0.2-uniform_0.8-11-14_plus / checkpoint-387
|
| 13 |
+
|
| 14 |
+
CPT checkpoint `checkpoint-387` from `cpt0.2-uniform_0.8-11-14_plus`, used by the op11-14 CPT-to-RL budget scripts.
|
| 15 |
+
|
| 16 |
+
## Citation
|
| 17 |
+
|
| 18 |
+
```bibtex
|
| 19 |
+
@misc{zhang2025interplaypretrainingmidtrainingrl,
|
| 20 |
+
title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
|
| 21 |
+
author={Charlie Zhang and Graham Neubig and Xiang Yue},
|
| 22 |
+
year={2025},
|
| 23 |
+
eprint={2512.07783},
|
| 24 |
+
archivePrefix={arXiv},
|
| 25 |
+
primaryClass={cs.CL},
|
| 26 |
+
url={https://arxiv.org/abs/2512.07783},
|
| 27 |
+
}
|
| 28 |
+
```
|
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-3870/README.md
ADDED
|
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: other
|
| 3 |
+
library_name: transformers
|
| 4 |
+
tags:
|
| 5 |
+
- reasoning
|
| 6 |
+
- mid-training
|
| 7 |
+
- extrapolation
|
| 8 |
+
- synthetic-data
|
| 9 |
+
- transformers
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
# id2-10_0.2easy_0.3medium_0.5hard / midtrain / cpt0.2-uniform_0.8-11-14_plus / checkpoint-3870
|
| 13 |
+
|
| 14 |
+
CPT checkpoint `checkpoint-3870` from `cpt0.2-uniform_0.8-11-14_plus`, used by the op11-14 CPT-to-RL budget scripts.
|
| 15 |
+
|
| 16 |
+
## Citation
|
| 17 |
+
|
| 18 |
+
```bibtex
|
| 19 |
+
@misc{zhang2025interplaypretrainingmidtrainingrl,
|
| 20 |
+
title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
|
| 21 |
+
author={Charlie Zhang and Graham Neubig and Xiang Yue},
|
| 22 |
+
year={2025},
|
| 23 |
+
eprint={2512.07783},
|
| 24 |
+
archivePrefix={arXiv},
|
| 25 |
+
primaryClass={cs.CL},
|
| 26 |
+
url={https://arxiv.org/abs/2512.07783},
|
| 27 |
+
}
|
| 28 |
+
```
|
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-4644/README.md
ADDED
|
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: other
|
| 3 |
+
library_name: transformers
|
| 4 |
+
tags:
|
| 5 |
+
- reasoning
|
| 6 |
+
- mid-training
|
| 7 |
+
- extrapolation
|
| 8 |
+
- synthetic-data
|
| 9 |
+
- transformers
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
# id2-10_0.2easy_0.3medium_0.5hard / midtrain / cpt0.2-uniform_0.8-11-14_plus / checkpoint-4644
|
| 13 |
+
|
| 14 |
+
CPT checkpoint `checkpoint-4644` from `cpt0.2-uniform_0.8-11-14_plus`, used by the op11-14 CPT-to-RL budget scripts.
|
| 15 |
+
|
| 16 |
+
## Citation
|
| 17 |
+
|
| 18 |
+
```bibtex
|
| 19 |
+
@misc{zhang2025interplaypretrainingmidtrainingrl,
|
| 20 |
+
title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
|
| 21 |
+
author={Charlie Zhang and Graham Neubig and Xiang Yue},
|
| 22 |
+
year={2025},
|
| 23 |
+
eprint={2512.07783},
|
| 24 |
+
archivePrefix={arXiv},
|
| 25 |
+
primaryClass={cs.CL},
|
| 26 |
+
url={https://arxiv.org/abs/2512.07783},
|
| 27 |
+
}
|
| 28 |
+
```
|
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-6579/README.md
ADDED
|
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: other
|
| 3 |
+
library_name: transformers
|
| 4 |
+
tags:
|
| 5 |
+
- reasoning
|
| 6 |
+
- mid-training
|
| 7 |
+
- extrapolation
|
| 8 |
+
- synthetic-data
|
| 9 |
+
- transformers
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
# id2-10_0.2easy_0.3medium_0.5hard / midtrain / cpt0.2-uniform_0.8-11-14_plus / checkpoint-6579
|
| 13 |
+
|
| 14 |
+
CPT checkpoint `checkpoint-6579` from `cpt0.2-uniform_0.8-11-14_plus`, used by the op11-14 CPT-to-RL budget scripts.
|
| 15 |
+
|
| 16 |
+
## Citation
|
| 17 |
+
|
| 18 |
+
```bibtex
|
| 19 |
+
@misc{zhang2025interplaypretrainingmidtrainingrl,
|
| 20 |
+
title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
|
| 21 |
+
author={Charlie Zhang and Graham Neubig and Xiang Yue},
|
| 22 |
+
year={2025},
|
| 23 |
+
eprint={2512.07783},
|
| 24 |
+
archivePrefix={arXiv},
|
| 25 |
+
primaryClass={cs.CL},
|
| 26 |
+
url={https://arxiv.org/abs/2512.07783},
|
| 27 |
+
}
|
| 28 |
+
```
|
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-774/README.md
ADDED
|
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: other
|
| 3 |
+
library_name: transformers
|
| 4 |
+
tags:
|
| 5 |
+
- reasoning
|
| 6 |
+
- mid-training
|
| 7 |
+
- extrapolation
|
| 8 |
+
- synthetic-data
|
| 9 |
+
- transformers
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
# id2-10_0.2easy_0.3medium_0.5hard / midtrain / cpt0.2-uniform_0.8-11-14_plus / checkpoint-774
|
| 13 |
+
|
| 14 |
+
CPT checkpoint `checkpoint-774` from `cpt0.2-uniform_0.8-11-14_plus`, used by the op11-14 CPT-to-RL budget scripts.
|
| 15 |
+
|
| 16 |
+
## Citation
|
| 17 |
+
|
| 18 |
+
```bibtex
|
| 19 |
+
@misc{zhang2025interplaypretrainingmidtrainingrl,
|
| 20 |
+
title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
|
| 21 |
+
author={Charlie Zhang and Graham Neubig and Xiang Yue},
|
| 22 |
+
year={2025},
|
| 23 |
+
eprint={2512.07783},
|
| 24 |
+
archivePrefix={arXiv},
|
| 25 |
+
primaryClass={cs.CL},
|
| 26 |
+
url={https://arxiv.org/abs/2512.07783},
|
| 27 |
+
}
|
| 28 |
+
```
|
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-7740/README.md
ADDED
|
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: other
|
| 3 |
+
library_name: transformers
|
| 4 |
+
tags:
|
| 5 |
+
- reasoning
|
| 6 |
+
- mid-training
|
| 7 |
+
- extrapolation
|
| 8 |
+
- synthetic-data
|
| 9 |
+
- transformers
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
# id2-10_0.2easy_0.3medium_0.5hard / midtrain / cpt0.2-uniform_0.8-11-14_plus / checkpoint-7740
|
| 13 |
+
|
| 14 |
+
CPT checkpoint `checkpoint-7740` from `cpt0.2-uniform_0.8-11-14_plus`, used by the op11-14 CPT-to-RL budget scripts.
|
| 15 |
+
|
| 16 |
+
## Citation
|
| 17 |
+
|
| 18 |
+
```bibtex
|
| 19 |
+
@misc{zhang2025interplaypretrainingmidtrainingrl,
|
| 20 |
+
title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
|
| 21 |
+
author={Charlie Zhang and Graham Neubig and Xiang Yue},
|
| 22 |
+
year={2025},
|
| 23 |
+
eprint={2512.07783},
|
| 24 |
+
archivePrefix={arXiv},
|
| 25 |
+
primaryClass={cs.CL},
|
| 26 |
+
url={https://arxiv.org/abs/2512.07783},
|
| 27 |
+
}
|
| 28 |
+
```
|
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-8127/README.md
ADDED
|
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: other
|
| 3 |
+
library_name: transformers
|
| 4 |
+
tags:
|
| 5 |
+
- reasoning
|
| 6 |
+
- mid-training
|
| 7 |
+
- extrapolation
|
| 8 |
+
- synthetic-data
|
| 9 |
+
- transformers
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
# id2-10_0.2easy_0.3medium_0.5hard / midtrain / cpt0.2-uniform_0.8-11-14_plus / checkpoint-8127
|
| 13 |
+
|
| 14 |
+
CPT checkpoint `checkpoint-8127` from `cpt0.2-uniform_0.8-11-14_plus`, used by the op11-14 CPT-to-RL budget scripts.
|
| 15 |
+
|
| 16 |
+
## Citation
|
| 17 |
+
|
| 18 |
+
```bibtex
|
| 19 |
+
@misc{zhang2025interplaypretrainingmidtrainingrl,
|
| 20 |
+
title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
|
| 21 |
+
author={Charlie Zhang and Graham Neubig and Xiang Yue},
|
| 22 |
+
year={2025},
|
| 23 |
+
eprint={2512.07783},
|
| 24 |
+
archivePrefix={arXiv},
|
| 25 |
+
primaryClass={cs.CL},
|
| 26 |
+
url={https://arxiv.org/abs/2512.07783},
|
| 27 |
+
}
|
| 28 |
+
```
|
id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-100step-0.8RL/README.md
ADDED
|
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: other
|
| 3 |
+
library_name: transformers
|
| 4 |
+
tags:
|
| 5 |
+
- reasoning
|
| 6 |
+
- mid-training
|
| 7 |
+
- extrapolation
|
| 8 |
+
- synthetic-data
|
| 9 |
+
- transformers
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
# id2-10_0.2easy_0.3medium_0.5hard / rl / cpt0.2-rl-op11-14_uniform-100step-0.8RL
|
| 13 |
+
|
| 14 |
+
Final RL checkpoint from `cpt0.2-rl-op11-14_uniform-100step-0.8RL`, uploaded from `global_step_80`. This run uses CPT `checkpoint-774` with nominal step `100` and CPT epoch `0.2`.
|
| 15 |
+
|
| 16 |
+
## Citation
|
| 17 |
+
|
| 18 |
+
```bibtex
|
| 19 |
+
@misc{zhang2025interplaypretrainingmidtrainingrl,
|
| 20 |
+
title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
|
| 21 |
+
author={Charlie Zhang and Graham Neubig and Xiang Yue},
|
| 22 |
+
year={2025},
|
| 23 |
+
eprint={2512.07783},
|
| 24 |
+
archivePrefix={arXiv},
|
| 25 |
+
primaryClass={cs.CL},
|
| 26 |
+
url={https://arxiv.org/abs/2512.07783},
|
| 27 |
+
}
|
| 28 |
+
```
|
id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-200step-0.8RL/README.md
ADDED
|
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: other
|
| 3 |
+
library_name: transformers
|
| 4 |
+
tags:
|
| 5 |
+
- reasoning
|
| 6 |
+
- mid-training
|
| 7 |
+
- extrapolation
|
| 8 |
+
- synthetic-data
|
| 9 |
+
- transformers
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
# id2-10_0.2easy_0.3medium_0.5hard / rl / cpt0.2-rl-op11-14_uniform-200step-0.8RL
|
| 13 |
+
|
| 14 |
+
Final RL checkpoint from `cpt0.2-rl-op11-14_uniform-200step-0.8RL`, uploaded from `global_step_160`. This run uses CPT `checkpoint-1548` with nominal step `200` and CPT epoch `0.2`.
|
| 15 |
+
|
| 16 |
+
## Citation
|
| 17 |
+
|
| 18 |
+
```bibtex
|
| 19 |
+
@misc{zhang2025interplaypretrainingmidtrainingrl,
|
| 20 |
+
title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
|
| 21 |
+
author={Charlie Zhang and Graham Neubig and Xiang Yue},
|
| 22 |
+
year={2025},
|
| 23 |
+
eprint={2512.07783},
|
| 24 |
+
archivePrefix={arXiv},
|
| 25 |
+
primaryClass={cs.CL},
|
| 26 |
+
url={https://arxiv.org/abs/2512.07783},
|
| 27 |
+
}
|
| 28 |
+
```
|
id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-50step-0.8RL/README.md
ADDED
|
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: other
|
| 3 |
+
library_name: transformers
|
| 4 |
+
tags:
|
| 5 |
+
- reasoning
|
| 6 |
+
- mid-training
|
| 7 |
+
- extrapolation
|
| 8 |
+
- synthetic-data
|
| 9 |
+
- transformers
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
# id2-10_0.2easy_0.3medium_0.5hard / rl / cpt0.2-rl-op11-14_uniform-50step-0.8RL
|
| 13 |
+
|
| 14 |
+
Final RL checkpoint from `cpt0.2-rl-op11-14_uniform-50step-0.8RL`, uploaded from `global_step_40`. This run uses CPT `checkpoint-387` with nominal step `50` and CPT epoch `0.2`.
|
| 15 |
+
|
| 16 |
+
## Citation
|
| 17 |
+
|
| 18 |
+
```bibtex
|
| 19 |
+
@misc{zhang2025interplaypretrainingmidtrainingrl,
|
| 20 |
+
title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
|
| 21 |
+
author={Charlie Zhang and Graham Neubig and Xiang Yue},
|
| 22 |
+
year={2025},
|
| 23 |
+
eprint={2512.07783},
|
| 24 |
+
archivePrefix={arXiv},
|
| 25 |
+
primaryClass={cs.CL},
|
| 26 |
+
url={https://arxiv.org/abs/2512.07783},
|
| 27 |
+
}
|
| 28 |
+
```
|
id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.5-rl-op11-14_uniform-100step-0.5RL/README.md
ADDED
|
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: other
|
| 3 |
+
library_name: transformers
|
| 4 |
+
tags:
|
| 5 |
+
- reasoning
|
| 6 |
+
- mid-training
|
| 7 |
+
- extrapolation
|
| 8 |
+
- synthetic-data
|
| 9 |
+
- transformers
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
# id2-10_0.2easy_0.3medium_0.5hard / rl / cpt0.5-rl-op11-14_uniform-100step-0.5RL
|
| 13 |
+
|
| 14 |
+
Final RL checkpoint from `cpt0.5-rl-op11-14_uniform-100step-0.5RL`, uploaded from `global_step_50`. This run uses CPT `checkpoint-1935` with nominal step `100` and CPT epoch `0.5`.
|
| 15 |
+
|
| 16 |
+
## Citation
|
| 17 |
+
|
| 18 |
+
```bibtex
|
| 19 |
+
@misc{zhang2025interplaypretrainingmidtrainingrl,
|
| 20 |
+
title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
|
| 21 |
+
author={Charlie Zhang and Graham Neubig and Xiang Yue},
|
| 22 |
+
year={2025},
|
| 23 |
+
eprint={2512.07783},
|
| 24 |
+
archivePrefix={arXiv},
|
| 25 |
+
primaryClass={cs.CL},
|
| 26 |
+
url={https://arxiv.org/abs/2512.07783},
|
| 27 |
+
}
|
| 28 |
+
```
|
id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.8-rl-op11-14_uniform-100step-0.2RL/README.md
ADDED
|
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: other
|
| 3 |
+
library_name: transformers
|
| 4 |
+
tags:
|
| 5 |
+
- reasoning
|
| 6 |
+
- mid-training
|
| 7 |
+
- extrapolation
|
| 8 |
+
- synthetic-data
|
| 9 |
+
- transformers
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
# id2-10_0.2easy_0.3medium_0.5hard / rl / cpt0.8-rl-op11-14_uniform-100step-0.2RL
|
| 13 |
+
|
| 14 |
+
Final RL checkpoint from `cpt0.8-rl-op11-14_uniform-100step-0.2RL`, uploaded from `global_step_19`. This run uses CPT `checkpoint-3096` with nominal step `100` and CPT epoch `0.8`.
|
| 15 |
+
|
| 16 |
+
## Citation
|
| 17 |
+
|
| 18 |
+
```bibtex
|
| 19 |
+
@misc{zhang2025interplaypretrainingmidtrainingrl,
|
| 20 |
+
title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
|
| 21 |
+
author={Charlie Zhang and Graham Neubig and Xiang Yue},
|
| 22 |
+
year={2025},
|
| 23 |
+
eprint={2512.07783},
|
| 24 |
+
archivePrefix={arXiv},
|
| 25 |
+
primaryClass={cs.CL},
|
| 26 |
+
url={https://arxiv.org/abs/2512.07783},
|
| 27 |
+
}
|
| 28 |
+
```
|