Clockz commited on
Commit
087ca94
·
verified ·
1 Parent(s): 1ba735f

Add files using upload-large-folder tool

Browse files
Files changed (25) hide show
  1. README.md +77 -53
  2. id2-10_0.2easy_0.3medium_0.5hard/README.md +28 -0
  3. id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-10062/README.md +28 -0
  4. id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-11997/README.md +28 -0
  5. id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-12771/README.md +28 -0
  6. id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-1548/README.md +28 -0
  7. id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-15867/README.md +28 -0
  8. id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-16254/README.md +28 -0
  9. id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-18963/README.md +28 -0
  10. id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-1935/README.md +28 -0
  11. id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-19350/README.md +28 -0
  12. id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-25542/README.md +28 -0
  13. id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-3096/README.md +28 -0
  14. id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-387/README.md +28 -0
  15. id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-3870/README.md +28 -0
  16. id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-4644/README.md +28 -0
  17. id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-6579/README.md +28 -0
  18. id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-774/README.md +28 -0
  19. id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-7740/README.md +28 -0
  20. id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-8127/README.md +28 -0
  21. id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-100step-0.8RL/README.md +28 -0
  22. id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-200step-0.8RL/README.md +28 -0
  23. id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-50step-0.8RL/README.md +28 -0
  24. id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.5-rl-op11-14_uniform-100step-0.5RL/README.md +28 -0
  25. id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.8-rl-op11-14_uniform-100step-0.2RL/README.md +28 -0
README.md CHANGED
@@ -1,65 +1,89 @@
1
  ---
2
- license: mit
3
- pipeline_tag: text-generation
 
 
 
 
 
 
4
  ---
5
 
6
- <h1 align="center">
7
- On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models
8
- </h1>
9
-
10
- <div align="center">
11
-
12
- <a href="https://chenlong-clock.github.io">Charlie Zhang</a>, <a href="https://www.phontron.com">Graham Neubig</a>,
13
- <a href="https://xiangyue9607.github.io">Xiang Yue</a>
14
-
15
- Carnegie Mellon University, Language Technologies Institute
16
-
17
- </div>
18
-
19
- <div align="center">
20
-
21
- [![arXiv](https://img.shields.io/badge/arXiv-2512.07783-b31b1b.svg?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2512.07783)
22
- [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
23
- ![Python](https://img.shields.io/badge/python-3.9%2B-blue)
24
-
25
- </div>
26
-
27
- ## Does Reinforcement Learning Truly Extend Reasoning?
28
-
29
- This work explores the discrepancy in views on RL's effectiveness in extending language models' reasoning abilities. Some characterize RL as a capability refiner, while others see it as inducing new compositional skills. This challenge stems from a lack of control in modern training pipelines. Our work aims to resolve this conflict through controlled analysis, going beyond the initial description that this repository contains mid-training related checkpoints in the extrapolation tasks.
30
-
31
- ## 🔍 Overview
32
-
33
- Our paper builds a fully controlled experimental framework to analyze how pre-training, mid-training, and RL-based post-training jointly shape the reasoning abilities of language models. Using synthetic math-style reasoning tasks with explicit atomic operations and process-verifiable reasoning traces, we study:
34
-
35
- * **Extrapolative generalization** to more complex compositions (deeper dependency graphs).
36
- * **Contextual generalization** across diverse surface forms and linguistic contexts.
37
- * How **RL interacts** with prior knowledge, and when it yields **genuine capability gains** beyond pre-training.
38
-
39
- ## 🧠 Key findings
40
- <div align="center">
41
- <h1 align="center">
42
- <img src="assets/findings.png" width="500" />
43
- </h1>
44
- </div>
45
- You may also find the comic generated by Notebook LLM [here](assets/Interplay-LM-Reasoning.pdf).
46
-
47
- ## Code
48
-
49
- The code and data for this work will be released soon at the following GitHub repository: [https://github.com/Interplay-LM-Reasoning/Interplay-LM-Reasoning](https://github.com/Interplay-LM-Reasoning/Interplay-LM-Reasoning)
50
-
51
- ## 📚 Citation
52
-
53
- If you find this work or code useful, please consider citing:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
54
 
55
  ```bibtex
56
  @misc{zhang2025interplaypretrainingmidtrainingrl,
57
- title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
58
  author={Charlie Zhang and Graham Neubig and Xiang Yue},
59
  year={2025},
60
  eprint={2512.07783},
61
  archivePrefix={arXiv},
62
  primaryClass={cs.CL},
63
- url={https://arxiv.org/abs/2512.07783},
64
  }
65
- ```
 
1
  ---
2
+ license: other
3
+ library_name: transformers
4
+ tags:
5
+ - reasoning
6
+ - mid-training
7
+ - extrapolation
8
+ - synthetic-data
9
+ - transformers
10
  ---
11
 
12
+ # Interplay-LM Extrapolation Mid-Train Models
13
+
14
+ This repository contains the `op11-14` CPT checkpoints and corresponding local RL outputs used by `scripts/composition/op-difficulty-10B/script_cpt_rl/id2-10_0.2easy_0.3medium_0.5hard_cpt11-14`.
15
+
16
+ For pretraining, only `cpt0.2-uniform_0.8-11-14_plus` is included. For RL, only final `actor/huggingface` checkpoints found locally are uploaded.
17
+
18
+ ## CPT Checkpoints
19
+
20
+ | Path | Checkpoint | Used by nominal step / CPT epoch |
21
+ | --- | --- | --- |
22
+ | `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-387` | checkpoint-387 | 50step/0.2 |
23
+ | `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-774` | checkpoint-774 | 100step/0.2 |
24
+ | `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-1548` | checkpoint-1548 | 200step/0.2 |
25
+ | `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-1935` | checkpoint-1935 | 100step/0.5 |
26
+ | `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-3096` | checkpoint-3096 | 100step/0.8, 400step/0.2 |
27
+ | `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-3870` | checkpoint-3870 | 500step/0.2 |
28
+ | `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-4644` | checkpoint-4644 | 600step/0.2 |
29
+ | `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-6579` | checkpoint-6579 | 800step/0.2 |
30
+ | `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-7740` | checkpoint-7740 | 954step/0.2 |
31
+ | `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-8127` | checkpoint-8127 | 400step/0.5 |
32
+ | `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-10062` | checkpoint-10062 | 500step/0.5 |
33
+ | `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-11997` | checkpoint-11997 | 600step/0.5 |
34
+ | `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-12771` | checkpoint-12771 | 400step/0.8 |
35
+ | `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-15867` | checkpoint-15867 | 800step/0.5 |
36
+ | `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-16254` | checkpoint-16254 | 500step/0.8 |
37
+ | `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-18963` | checkpoint-18963 | 954step/0.5 |
38
+ | `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-19350` | checkpoint-19350 | 600step/0.8 |
39
+ | `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-25542` | checkpoint-25542 | 800step/0.8 |
40
+
41
+ ## RL Checkpoints
42
+
43
+ | Path | Nominal step | CPT epoch | Source CPT checkpoint | Uploaded checkpoint |
44
+ | --- | --- | --- | --- | --- |
45
+ | `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-50step-0.8RL` | 50 | 0.2 | checkpoint-387 | `global_step_40` |
46
+ | `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.8-rl-op11-14_uniform-100step-0.2RL` | 100 | 0.8 | checkpoint-3096 | `global_step_19` |
47
+ | `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.5-rl-op11-14_uniform-100step-0.5RL` | 100 | 0.5 | checkpoint-1935 | `global_step_50` |
48
+ | `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-100step-0.8RL` | 100 | 0.2 | checkpoint-774 | `global_step_80` |
49
+ | `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-200step-0.8RL` | 200 | 0.2 | checkpoint-1548 | `global_step_160` |
50
+ | `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.8-rl-op11-14_uniform-400step-0.2RL` | 400 | 0.8 | checkpoint-12771 | not found locally |
51
+ | `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.5-rl-op11-14_uniform-400step-0.5RL` | 400 | 0.5 | checkpoint-8127 | not found locally |
52
+ | `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-400step-0.8RL` | 400 | 0.2 | checkpoint-3096 | not found locally |
53
+ | `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.8-rl-op11-14_uniform-500step-0.2RL` | 500 | 0.8 | checkpoint-16254 | not found locally |
54
+ | `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.5-rl-op11-14_uniform-500step-0.5RL` | 500 | 0.5 | checkpoint-10062 | not found locally |
55
+ | `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-500step-0.8RL` | 500 | 0.2 | checkpoint-3870 | not found locally |
56
+ | `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.8-rl-op11-14_uniform-600step-0.2RL` | 600 | 0.8 | checkpoint-19350 | not found locally |
57
+ | `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.5-rl-op11-14_uniform-600step-0.5RL` | 600 | 0.5 | checkpoint-11997 | not found locally |
58
+ | `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-600step-0.8RL` | 600 | 0.2 | checkpoint-4644 | not found locally |
59
+ | `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.8-rl-op11-14_uniform-800step-0.2RL` | 800 | 0.8 | checkpoint-25542 | not found locally |
60
+ | `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.5-rl-op11-14_uniform-800step-0.5RL` | 800 | 0.5 | checkpoint-15867 | not found locally |
61
+ | `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-800step-0.8RL` | 800 | 0.2 | checkpoint-6579 | not found locally |
62
+ | `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.5-rl-op11-14_uniform-954step-0.5RL` | 954 | 0.5 | checkpoint-18963 | not found locally |
63
+ | `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-954step-0.8RL` | 954 | 0.2 | checkpoint-7740 | not found locally |
64
+
65
+ ## Load
66
+
67
+ ```python
68
+ from transformers import AutoModelForCausalLM, AutoTokenizer
69
+
70
+ repo_id = "Interplay-LM-Reasoning/extrapolation_midtrain"
71
+ subdir = "id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-25542"
72
+
73
+ tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder=subdir)
74
+ model = AutoModelForCausalLM.from_pretrained(repo_id, subfolder=subdir)
75
+ ```
76
+
77
+ ## Citation
78
 
79
  ```bibtex
80
  @misc{zhang2025interplaypretrainingmidtrainingrl,
81
+ title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
82
  author={Charlie Zhang and Graham Neubig and Xiang Yue},
83
  year={2025},
84
  eprint={2512.07783},
85
  archivePrefix={arXiv},
86
  primaryClass={cs.CL},
87
+ url={https://arxiv.org/abs/2512.07783},
88
  }
89
+ ```
id2-10_0.2easy_0.3medium_0.5hard/README.md ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ library_name: transformers
4
+ tags:
5
+ - reasoning
6
+ - mid-training
7
+ - extrapolation
8
+ - synthetic-data
9
+ - transformers
10
+ ---
11
+
12
+ # id2-10_0.2easy_0.3medium_0.5hard
13
+
14
+ `op11-14` extrapolation mid-training artifacts for `cpt0.2-uniform_0.8-11-14_plus`.
15
+
16
+ ## Citation
17
+
18
+ ```bibtex
19
+ @misc{zhang2025interplaypretrainingmidtrainingrl,
20
+ title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
21
+ author={Charlie Zhang and Graham Neubig and Xiang Yue},
22
+ year={2025},
23
+ eprint={2512.07783},
24
+ archivePrefix={arXiv},
25
+ primaryClass={cs.CL},
26
+ url={https://arxiv.org/abs/2512.07783},
27
+ }
28
+ ```
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-10062/README.md ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ library_name: transformers
4
+ tags:
5
+ - reasoning
6
+ - mid-training
7
+ - extrapolation
8
+ - synthetic-data
9
+ - transformers
10
+ ---
11
+
12
+ # id2-10_0.2easy_0.3medium_0.5hard / midtrain / cpt0.2-uniform_0.8-11-14_plus / checkpoint-10062
13
+
14
+ CPT checkpoint `checkpoint-10062` from `cpt0.2-uniform_0.8-11-14_plus`, used by the op11-14 CPT-to-RL budget scripts.
15
+
16
+ ## Citation
17
+
18
+ ```bibtex
19
+ @misc{zhang2025interplaypretrainingmidtrainingrl,
20
+ title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
21
+ author={Charlie Zhang and Graham Neubig and Xiang Yue},
22
+ year={2025},
23
+ eprint={2512.07783},
24
+ archivePrefix={arXiv},
25
+ primaryClass={cs.CL},
26
+ url={https://arxiv.org/abs/2512.07783},
27
+ }
28
+ ```
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-11997/README.md ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ library_name: transformers
4
+ tags:
5
+ - reasoning
6
+ - mid-training
7
+ - extrapolation
8
+ - synthetic-data
9
+ - transformers
10
+ ---
11
+
12
+ # id2-10_0.2easy_0.3medium_0.5hard / midtrain / cpt0.2-uniform_0.8-11-14_plus / checkpoint-11997
13
+
14
+ CPT checkpoint `checkpoint-11997` from `cpt0.2-uniform_0.8-11-14_plus`, used by the op11-14 CPT-to-RL budget scripts.
15
+
16
+ ## Citation
17
+
18
+ ```bibtex
19
+ @misc{zhang2025interplaypretrainingmidtrainingrl,
20
+ title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
21
+ author={Charlie Zhang and Graham Neubig and Xiang Yue},
22
+ year={2025},
23
+ eprint={2512.07783},
24
+ archivePrefix={arXiv},
25
+ primaryClass={cs.CL},
26
+ url={https://arxiv.org/abs/2512.07783},
27
+ }
28
+ ```
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-12771/README.md ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ library_name: transformers
4
+ tags:
5
+ - reasoning
6
+ - mid-training
7
+ - extrapolation
8
+ - synthetic-data
9
+ - transformers
10
+ ---
11
+
12
+ # id2-10_0.2easy_0.3medium_0.5hard / midtrain / cpt0.2-uniform_0.8-11-14_plus / checkpoint-12771
13
+
14
+ CPT checkpoint `checkpoint-12771` from `cpt0.2-uniform_0.8-11-14_plus`, used by the op11-14 CPT-to-RL budget scripts.
15
+
16
+ ## Citation
17
+
18
+ ```bibtex
19
+ @misc{zhang2025interplaypretrainingmidtrainingrl,
20
+ title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
21
+ author={Charlie Zhang and Graham Neubig and Xiang Yue},
22
+ year={2025},
23
+ eprint={2512.07783},
24
+ archivePrefix={arXiv},
25
+ primaryClass={cs.CL},
26
+ url={https://arxiv.org/abs/2512.07783},
27
+ }
28
+ ```
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-1548/README.md ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ library_name: transformers
4
+ tags:
5
+ - reasoning
6
+ - mid-training
7
+ - extrapolation
8
+ - synthetic-data
9
+ - transformers
10
+ ---
11
+
12
+ # id2-10_0.2easy_0.3medium_0.5hard / midtrain / cpt0.2-uniform_0.8-11-14_plus / checkpoint-1548
13
+
14
+ CPT checkpoint `checkpoint-1548` from `cpt0.2-uniform_0.8-11-14_plus`, used by the op11-14 CPT-to-RL budget scripts.
15
+
16
+ ## Citation
17
+
18
+ ```bibtex
19
+ @misc{zhang2025interplaypretrainingmidtrainingrl,
20
+ title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
21
+ author={Charlie Zhang and Graham Neubig and Xiang Yue},
22
+ year={2025},
23
+ eprint={2512.07783},
24
+ archivePrefix={arXiv},
25
+ primaryClass={cs.CL},
26
+ url={https://arxiv.org/abs/2512.07783},
27
+ }
28
+ ```
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-15867/README.md ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ library_name: transformers
4
+ tags:
5
+ - reasoning
6
+ - mid-training
7
+ - extrapolation
8
+ - synthetic-data
9
+ - transformers
10
+ ---
11
+
12
+ # id2-10_0.2easy_0.3medium_0.5hard / midtrain / cpt0.2-uniform_0.8-11-14_plus / checkpoint-15867
13
+
14
+ CPT checkpoint `checkpoint-15867` from `cpt0.2-uniform_0.8-11-14_plus`, used by the op11-14 CPT-to-RL budget scripts.
15
+
16
+ ## Citation
17
+
18
+ ```bibtex
19
+ @misc{zhang2025interplaypretrainingmidtrainingrl,
20
+ title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
21
+ author={Charlie Zhang and Graham Neubig and Xiang Yue},
22
+ year={2025},
23
+ eprint={2512.07783},
24
+ archivePrefix={arXiv},
25
+ primaryClass={cs.CL},
26
+ url={https://arxiv.org/abs/2512.07783},
27
+ }
28
+ ```
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-16254/README.md ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ library_name: transformers
4
+ tags:
5
+ - reasoning
6
+ - mid-training
7
+ - extrapolation
8
+ - synthetic-data
9
+ - transformers
10
+ ---
11
+
12
+ # id2-10_0.2easy_0.3medium_0.5hard / midtrain / cpt0.2-uniform_0.8-11-14_plus / checkpoint-16254
13
+
14
+ CPT checkpoint `checkpoint-16254` from `cpt0.2-uniform_0.8-11-14_plus`, used by the op11-14 CPT-to-RL budget scripts.
15
+
16
+ ## Citation
17
+
18
+ ```bibtex
19
+ @misc{zhang2025interplaypretrainingmidtrainingrl,
20
+ title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
21
+ author={Charlie Zhang and Graham Neubig and Xiang Yue},
22
+ year={2025},
23
+ eprint={2512.07783},
24
+ archivePrefix={arXiv},
25
+ primaryClass={cs.CL},
26
+ url={https://arxiv.org/abs/2512.07783},
27
+ }
28
+ ```
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-18963/README.md ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ library_name: transformers
4
+ tags:
5
+ - reasoning
6
+ - mid-training
7
+ - extrapolation
8
+ - synthetic-data
9
+ - transformers
10
+ ---
11
+
12
+ # id2-10_0.2easy_0.3medium_0.5hard / midtrain / cpt0.2-uniform_0.8-11-14_plus / checkpoint-18963
13
+
14
+ CPT checkpoint `checkpoint-18963` from `cpt0.2-uniform_0.8-11-14_plus`, used by the op11-14 CPT-to-RL budget scripts.
15
+
16
+ ## Citation
17
+
18
+ ```bibtex
19
+ @misc{zhang2025interplaypretrainingmidtrainingrl,
20
+ title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
21
+ author={Charlie Zhang and Graham Neubig and Xiang Yue},
22
+ year={2025},
23
+ eprint={2512.07783},
24
+ archivePrefix={arXiv},
25
+ primaryClass={cs.CL},
26
+ url={https://arxiv.org/abs/2512.07783},
27
+ }
28
+ ```
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-1935/README.md ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ library_name: transformers
4
+ tags:
5
+ - reasoning
6
+ - mid-training
7
+ - extrapolation
8
+ - synthetic-data
9
+ - transformers
10
+ ---
11
+
12
+ # id2-10_0.2easy_0.3medium_0.5hard / midtrain / cpt0.2-uniform_0.8-11-14_plus / checkpoint-1935
13
+
14
+ CPT checkpoint `checkpoint-1935` from `cpt0.2-uniform_0.8-11-14_plus`, used by the op11-14 CPT-to-RL budget scripts.
15
+
16
+ ## Citation
17
+
18
+ ```bibtex
19
+ @misc{zhang2025interplaypretrainingmidtrainingrl,
20
+ title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
21
+ author={Charlie Zhang and Graham Neubig and Xiang Yue},
22
+ year={2025},
23
+ eprint={2512.07783},
24
+ archivePrefix={arXiv},
25
+ primaryClass={cs.CL},
26
+ url={https://arxiv.org/abs/2512.07783},
27
+ }
28
+ ```
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-19350/README.md ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ library_name: transformers
4
+ tags:
5
+ - reasoning
6
+ - mid-training
7
+ - extrapolation
8
+ - synthetic-data
9
+ - transformers
10
+ ---
11
+
12
+ # id2-10_0.2easy_0.3medium_0.5hard / midtrain / cpt0.2-uniform_0.8-11-14_plus / checkpoint-19350
13
+
14
+ CPT checkpoint `checkpoint-19350` from `cpt0.2-uniform_0.8-11-14_plus`, used by the op11-14 CPT-to-RL budget scripts.
15
+
16
+ ## Citation
17
+
18
+ ```bibtex
19
+ @misc{zhang2025interplaypretrainingmidtrainingrl,
20
+ title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
21
+ author={Charlie Zhang and Graham Neubig and Xiang Yue},
22
+ year={2025},
23
+ eprint={2512.07783},
24
+ archivePrefix={arXiv},
25
+ primaryClass={cs.CL},
26
+ url={https://arxiv.org/abs/2512.07783},
27
+ }
28
+ ```
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-25542/README.md ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ library_name: transformers
4
+ tags:
5
+ - reasoning
6
+ - mid-training
7
+ - extrapolation
8
+ - synthetic-data
9
+ - transformers
10
+ ---
11
+
12
+ # id2-10_0.2easy_0.3medium_0.5hard / midtrain / cpt0.2-uniform_0.8-11-14_plus / checkpoint-25542
13
+
14
+ CPT checkpoint `checkpoint-25542` from `cpt0.2-uniform_0.8-11-14_plus`, used by the op11-14 CPT-to-RL budget scripts.
15
+
16
+ ## Citation
17
+
18
+ ```bibtex
19
+ @misc{zhang2025interplaypretrainingmidtrainingrl,
20
+ title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
21
+ author={Charlie Zhang and Graham Neubig and Xiang Yue},
22
+ year={2025},
23
+ eprint={2512.07783},
24
+ archivePrefix={arXiv},
25
+ primaryClass={cs.CL},
26
+ url={https://arxiv.org/abs/2512.07783},
27
+ }
28
+ ```
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-3096/README.md ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ library_name: transformers
4
+ tags:
5
+ - reasoning
6
+ - mid-training
7
+ - extrapolation
8
+ - synthetic-data
9
+ - transformers
10
+ ---
11
+
12
+ # id2-10_0.2easy_0.3medium_0.5hard / midtrain / cpt0.2-uniform_0.8-11-14_plus / checkpoint-3096
13
+
14
+ CPT checkpoint `checkpoint-3096` from `cpt0.2-uniform_0.8-11-14_plus`, used by the op11-14 CPT-to-RL budget scripts.
15
+
16
+ ## Citation
17
+
18
+ ```bibtex
19
+ @misc{zhang2025interplaypretrainingmidtrainingrl,
20
+ title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
21
+ author={Charlie Zhang and Graham Neubig and Xiang Yue},
22
+ year={2025},
23
+ eprint={2512.07783},
24
+ archivePrefix={arXiv},
25
+ primaryClass={cs.CL},
26
+ url={https://arxiv.org/abs/2512.07783},
27
+ }
28
+ ```
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-387/README.md ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ library_name: transformers
4
+ tags:
5
+ - reasoning
6
+ - mid-training
7
+ - extrapolation
8
+ - synthetic-data
9
+ - transformers
10
+ ---
11
+
12
+ # id2-10_0.2easy_0.3medium_0.5hard / midtrain / cpt0.2-uniform_0.8-11-14_plus / checkpoint-387
13
+
14
+ CPT checkpoint `checkpoint-387` from `cpt0.2-uniform_0.8-11-14_plus`, used by the op11-14 CPT-to-RL budget scripts.
15
+
16
+ ## Citation
17
+
18
+ ```bibtex
19
+ @misc{zhang2025interplaypretrainingmidtrainingrl,
20
+ title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
21
+ author={Charlie Zhang and Graham Neubig and Xiang Yue},
22
+ year={2025},
23
+ eprint={2512.07783},
24
+ archivePrefix={arXiv},
25
+ primaryClass={cs.CL},
26
+ url={https://arxiv.org/abs/2512.07783},
27
+ }
28
+ ```
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-3870/README.md ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ library_name: transformers
4
+ tags:
5
+ - reasoning
6
+ - mid-training
7
+ - extrapolation
8
+ - synthetic-data
9
+ - transformers
10
+ ---
11
+
12
+ # id2-10_0.2easy_0.3medium_0.5hard / midtrain / cpt0.2-uniform_0.8-11-14_plus / checkpoint-3870
13
+
14
+ CPT checkpoint `checkpoint-3870` from `cpt0.2-uniform_0.8-11-14_plus`, used by the op11-14 CPT-to-RL budget scripts.
15
+
16
+ ## Citation
17
+
18
+ ```bibtex
19
+ @misc{zhang2025interplaypretrainingmidtrainingrl,
20
+ title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
21
+ author={Charlie Zhang and Graham Neubig and Xiang Yue},
22
+ year={2025},
23
+ eprint={2512.07783},
24
+ archivePrefix={arXiv},
25
+ primaryClass={cs.CL},
26
+ url={https://arxiv.org/abs/2512.07783},
27
+ }
28
+ ```
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-4644/README.md ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ library_name: transformers
4
+ tags:
5
+ - reasoning
6
+ - mid-training
7
+ - extrapolation
8
+ - synthetic-data
9
+ - transformers
10
+ ---
11
+
12
+ # id2-10_0.2easy_0.3medium_0.5hard / midtrain / cpt0.2-uniform_0.8-11-14_plus / checkpoint-4644
13
+
14
+ CPT checkpoint `checkpoint-4644` from `cpt0.2-uniform_0.8-11-14_plus`, used by the op11-14 CPT-to-RL budget scripts.
15
+
16
+ ## Citation
17
+
18
+ ```bibtex
19
+ @misc{zhang2025interplaypretrainingmidtrainingrl,
20
+ title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
21
+ author={Charlie Zhang and Graham Neubig and Xiang Yue},
22
+ year={2025},
23
+ eprint={2512.07783},
24
+ archivePrefix={arXiv},
25
+ primaryClass={cs.CL},
26
+ url={https://arxiv.org/abs/2512.07783},
27
+ }
28
+ ```
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-6579/README.md ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ library_name: transformers
4
+ tags:
5
+ - reasoning
6
+ - mid-training
7
+ - extrapolation
8
+ - synthetic-data
9
+ - transformers
10
+ ---
11
+
12
+ # id2-10_0.2easy_0.3medium_0.5hard / midtrain / cpt0.2-uniform_0.8-11-14_plus / checkpoint-6579
13
+
14
+ CPT checkpoint `checkpoint-6579` from `cpt0.2-uniform_0.8-11-14_plus`, used by the op11-14 CPT-to-RL budget scripts.
15
+
16
+ ## Citation
17
+
18
+ ```bibtex
19
+ @misc{zhang2025interplaypretrainingmidtrainingrl,
20
+ title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
21
+ author={Charlie Zhang and Graham Neubig and Xiang Yue},
22
+ year={2025},
23
+ eprint={2512.07783},
24
+ archivePrefix={arXiv},
25
+ primaryClass={cs.CL},
26
+ url={https://arxiv.org/abs/2512.07783},
27
+ }
28
+ ```
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-774/README.md ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ library_name: transformers
4
+ tags:
5
+ - reasoning
6
+ - mid-training
7
+ - extrapolation
8
+ - synthetic-data
9
+ - transformers
10
+ ---
11
+
12
+ # id2-10_0.2easy_0.3medium_0.5hard / midtrain / cpt0.2-uniform_0.8-11-14_plus / checkpoint-774
13
+
14
+ CPT checkpoint `checkpoint-774` from `cpt0.2-uniform_0.8-11-14_plus`, used by the op11-14 CPT-to-RL budget scripts.
15
+
16
+ ## Citation
17
+
18
+ ```bibtex
19
+ @misc{zhang2025interplaypretrainingmidtrainingrl,
20
+ title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
21
+ author={Charlie Zhang and Graham Neubig and Xiang Yue},
22
+ year={2025},
23
+ eprint={2512.07783},
24
+ archivePrefix={arXiv},
25
+ primaryClass={cs.CL},
26
+ url={https://arxiv.org/abs/2512.07783},
27
+ }
28
+ ```
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-7740/README.md ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ library_name: transformers
4
+ tags:
5
+ - reasoning
6
+ - mid-training
7
+ - extrapolation
8
+ - synthetic-data
9
+ - transformers
10
+ ---
11
+
12
+ # id2-10_0.2easy_0.3medium_0.5hard / midtrain / cpt0.2-uniform_0.8-11-14_plus / checkpoint-7740
13
+
14
+ CPT checkpoint `checkpoint-7740` from `cpt0.2-uniform_0.8-11-14_plus`, used by the op11-14 CPT-to-RL budget scripts.
15
+
16
+ ## Citation
17
+
18
+ ```bibtex
19
+ @misc{zhang2025interplaypretrainingmidtrainingrl,
20
+ title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
21
+ author={Charlie Zhang and Graham Neubig and Xiang Yue},
22
+ year={2025},
23
+ eprint={2512.07783},
24
+ archivePrefix={arXiv},
25
+ primaryClass={cs.CL},
26
+ url={https://arxiv.org/abs/2512.07783},
27
+ }
28
+ ```
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-8127/README.md ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ library_name: transformers
4
+ tags:
5
+ - reasoning
6
+ - mid-training
7
+ - extrapolation
8
+ - synthetic-data
9
+ - transformers
10
+ ---
11
+
12
+ # id2-10_0.2easy_0.3medium_0.5hard / midtrain / cpt0.2-uniform_0.8-11-14_plus / checkpoint-8127
13
+
14
+ CPT checkpoint `checkpoint-8127` from `cpt0.2-uniform_0.8-11-14_plus`, used by the op11-14 CPT-to-RL budget scripts.
15
+
16
+ ## Citation
17
+
18
+ ```bibtex
19
+ @misc{zhang2025interplaypretrainingmidtrainingrl,
20
+ title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
21
+ author={Charlie Zhang and Graham Neubig and Xiang Yue},
22
+ year={2025},
23
+ eprint={2512.07783},
24
+ archivePrefix={arXiv},
25
+ primaryClass={cs.CL},
26
+ url={https://arxiv.org/abs/2512.07783},
27
+ }
28
+ ```
id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-100step-0.8RL/README.md ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ library_name: transformers
4
+ tags:
5
+ - reasoning
6
+ - mid-training
7
+ - extrapolation
8
+ - synthetic-data
9
+ - transformers
10
+ ---
11
+
12
+ # id2-10_0.2easy_0.3medium_0.5hard / rl / cpt0.2-rl-op11-14_uniform-100step-0.8RL
13
+
14
+ Final RL checkpoint from `cpt0.2-rl-op11-14_uniform-100step-0.8RL`, uploaded from `global_step_80`. This run uses CPT `checkpoint-774` with nominal step `100` and CPT epoch `0.2`.
15
+
16
+ ## Citation
17
+
18
+ ```bibtex
19
+ @misc{zhang2025interplaypretrainingmidtrainingrl,
20
+ title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
21
+ author={Charlie Zhang and Graham Neubig and Xiang Yue},
22
+ year={2025},
23
+ eprint={2512.07783},
24
+ archivePrefix={arXiv},
25
+ primaryClass={cs.CL},
26
+ url={https://arxiv.org/abs/2512.07783},
27
+ }
28
+ ```
id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-200step-0.8RL/README.md ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ library_name: transformers
4
+ tags:
5
+ - reasoning
6
+ - mid-training
7
+ - extrapolation
8
+ - synthetic-data
9
+ - transformers
10
+ ---
11
+
12
+ # id2-10_0.2easy_0.3medium_0.5hard / rl / cpt0.2-rl-op11-14_uniform-200step-0.8RL
13
+
14
+ Final RL checkpoint from `cpt0.2-rl-op11-14_uniform-200step-0.8RL`, uploaded from `global_step_160`. This run uses CPT `checkpoint-1548` with nominal step `200` and CPT epoch `0.2`.
15
+
16
+ ## Citation
17
+
18
+ ```bibtex
19
+ @misc{zhang2025interplaypretrainingmidtrainingrl,
20
+ title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
21
+ author={Charlie Zhang and Graham Neubig and Xiang Yue},
22
+ year={2025},
23
+ eprint={2512.07783},
24
+ archivePrefix={arXiv},
25
+ primaryClass={cs.CL},
26
+ url={https://arxiv.org/abs/2512.07783},
27
+ }
28
+ ```
id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-50step-0.8RL/README.md ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ library_name: transformers
4
+ tags:
5
+ - reasoning
6
+ - mid-training
7
+ - extrapolation
8
+ - synthetic-data
9
+ - transformers
10
+ ---
11
+
12
+ # id2-10_0.2easy_0.3medium_0.5hard / rl / cpt0.2-rl-op11-14_uniform-50step-0.8RL
13
+
14
+ Final RL checkpoint from `cpt0.2-rl-op11-14_uniform-50step-0.8RL`, uploaded from `global_step_40`. This run uses CPT `checkpoint-387` with nominal step `50` and CPT epoch `0.2`.
15
+
16
+ ## Citation
17
+
18
+ ```bibtex
19
+ @misc{zhang2025interplaypretrainingmidtrainingrl,
20
+ title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
21
+ author={Charlie Zhang and Graham Neubig and Xiang Yue},
22
+ year={2025},
23
+ eprint={2512.07783},
24
+ archivePrefix={arXiv},
25
+ primaryClass={cs.CL},
26
+ url={https://arxiv.org/abs/2512.07783},
27
+ }
28
+ ```
id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.5-rl-op11-14_uniform-100step-0.5RL/README.md ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ library_name: transformers
4
+ tags:
5
+ - reasoning
6
+ - mid-training
7
+ - extrapolation
8
+ - synthetic-data
9
+ - transformers
10
+ ---
11
+
12
+ # id2-10_0.2easy_0.3medium_0.5hard / rl / cpt0.5-rl-op11-14_uniform-100step-0.5RL
13
+
14
+ Final RL checkpoint from `cpt0.5-rl-op11-14_uniform-100step-0.5RL`, uploaded from `global_step_50`. This run uses CPT `checkpoint-1935` with nominal step `100` and CPT epoch `0.5`.
15
+
16
+ ## Citation
17
+
18
+ ```bibtex
19
+ @misc{zhang2025interplaypretrainingmidtrainingrl,
20
+ title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
21
+ author={Charlie Zhang and Graham Neubig and Xiang Yue},
22
+ year={2025},
23
+ eprint={2512.07783},
24
+ archivePrefix={arXiv},
25
+ primaryClass={cs.CL},
26
+ url={https://arxiv.org/abs/2512.07783},
27
+ }
28
+ ```
id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.8-rl-op11-14_uniform-100step-0.2RL/README.md ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ library_name: transformers
4
+ tags:
5
+ - reasoning
6
+ - mid-training
7
+ - extrapolation
8
+ - synthetic-data
9
+ - transformers
10
+ ---
11
+
12
+ # id2-10_0.2easy_0.3medium_0.5hard / rl / cpt0.8-rl-op11-14_uniform-100step-0.2RL
13
+
14
+ Final RL checkpoint from `cpt0.8-rl-op11-14_uniform-100step-0.2RL`, uploaded from `global_step_19`. This run uses CPT `checkpoint-3096` with nominal step `100` and CPT epoch `0.8`.
15
+
16
+ ## Citation
17
+
18
+ ```bibtex
19
+ @misc{zhang2025interplaypretrainingmidtrainingrl,
20
+ title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
21
+ author={Charlie Zhang and Graham Neubig and Xiang Yue},
22
+ year={2025},
23
+ eprint={2512.07783},
24
+ archivePrefix={arXiv},
25
+ primaryClass={cs.CL},
26
+ url={https://arxiv.org/abs/2512.07783},
27
+ }
28
+ ```