Clockz commited on Apr 7

Commit

087ca94

verified ·

1 Parent(s): 1ba735f

Add files using upload-large-folder tool

Browse files

Files changed (25) hide show

README.md +77 -53
id2-10_0.2easy_0.3medium_0.5hard/README.md +28 -0
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-10062/README.md +28 -0
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-11997/README.md +28 -0
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-12771/README.md +28 -0
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-1548/README.md +28 -0
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-15867/README.md +28 -0
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-16254/README.md +28 -0
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-18963/README.md +28 -0
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-1935/README.md +28 -0
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-19350/README.md +28 -0
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-25542/README.md +28 -0
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-3096/README.md +28 -0
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-387/README.md +28 -0
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-3870/README.md +28 -0
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-4644/README.md +28 -0
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-6579/README.md +28 -0
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-774/README.md +28 -0
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-7740/README.md +28 -0
id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-8127/README.md +28 -0
id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-100step-0.8RL/README.md +28 -0
id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-200step-0.8RL/README.md +28 -0
id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-50step-0.8RL/README.md +28 -0
id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.5-rl-op11-14_uniform-100step-0.5RL/README.md +28 -0
id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.8-rl-op11-14_uniform-100step-0.2RL/README.md +28 -0

README.md CHANGED Viewed

@@ -1,65 +1,89 @@
 ---
-license: mit
-pipeline_tag: text-generation
 ---
-<h1 align="center">
-On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models
-</h1>
-<div align="center">
-<a href="https://chenlong-clock.github.io">Charlie Zhang</a>, <a href="https://www.phontron.com">Graham Neubig</a>,
-<a href="https://xiangyue9607.github.io">Xiang Yue</a>
-Carnegie Mellon University, Language Technologies Institute
-</div>
-<div align="center">
-[![arXiv](https://img.shields.io/badge/arXiv-2512.07783-b31b1b.svg?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2512.07783)
-[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
-![Python](https://img.shields.io/badge/python-3.9%2B-blue)
-</div>
-## Does Reinforcement Learning Truly Extend Reasoning?
-This work explores the discrepancy in views on RL's effectiveness in extending language models' reasoning abilities. Some characterize RL as a capability refiner, while others see it as inducing new compositional skills. This challenge stems from a lack of control in modern training pipelines. Our work aims to resolve this conflict through controlled analysis, going beyond the initial description that this repository contains mid-training related checkpoints in the extrapolation tasks.
-## 🔍 Overview
-Our paper builds a fully controlled experimental framework to analyze how pre-training, mid-training, and RL-based post-training jointly shape the reasoning abilities of language models. Using synthetic math-style reasoning tasks with explicit atomic operations and process-verifiable reasoning traces, we study:
-*   **Extrapolative generalization** to more complex compositions (deeper dependency graphs).
-*   **Contextual generalization** across diverse surface forms and linguistic contexts.
-*   How **RL interacts** with prior knowledge, and when it yields **genuine capability gains** beyond pre-training.
-## 🧠 Key findings
-<div align="center">
-  <h1 align="center">
-    <img src="assets/findings.png" width="500" />
-    </h1>
-</div>
-You may also find the comic generated by Notebook LLM [here](assets/Interplay-LM-Reasoning.pdf).
-## Code
-The code and data for this work will be released soon at the following GitHub repository: [https://github.com/Interplay-LM-Reasoning/Interplay-LM-Reasoning](https://github.com/Interplay-LM-Reasoning/Interplay-LM-Reasoning)
-## 📚 Citation
-If you find this work or code useful, please consider citing:
 ```bibtex
 @misc{zhang2025interplaypretrainingmidtrainingrl,
-      title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
       author={Charlie Zhang and Graham Neubig and Xiang Yue},
       year={2025},
       eprint={2512.07783},
       archivePrefix={arXiv},
       primaryClass={cs.CL},
-      url={https://arxiv.org/abs/2512.07783},
 }
-```

 ---
+license: other
+library_name: transformers
+tags:
+- reasoning
+- mid-training
+- extrapolation
+- synthetic-data
+- transformers
 ---
+# Interplay-LM Extrapolation Mid-Train Models
+This repository contains the `op11-14` CPT checkpoints and corresponding local RL outputs used by `scripts/composition/op-difficulty-10B/script_cpt_rl/id2-10_0.2easy_0.3medium_0.5hard_cpt11-14`.
+For pretraining, only `cpt0.2-uniform_0.8-11-14_plus` is included. For RL, only final `actor/huggingface` checkpoints found locally are uploaded.
+## CPT Checkpoints
+| Path | Checkpoint | Used by nominal step / CPT epoch |
+| --- | --- | --- |
+| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-387` | checkpoint-387 | 50step/0.2 |
+| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-774` | checkpoint-774 | 100step/0.2 |
+| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-1548` | checkpoint-1548 | 200step/0.2 |
+| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-1935` | checkpoint-1935 | 100step/0.5 |
+| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-3096` | checkpoint-3096 | 100step/0.8, 400step/0.2 |
+| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-3870` | checkpoint-3870 | 500step/0.2 |
+| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-4644` | checkpoint-4644 | 600step/0.2 |
+| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-6579` | checkpoint-6579 | 800step/0.2 |
+| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-7740` | checkpoint-7740 | 954step/0.2 |
+| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-8127` | checkpoint-8127 | 400step/0.5 |
+| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-10062` | checkpoint-10062 | 500step/0.5 |
+| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-11997` | checkpoint-11997 | 600step/0.5 |
+| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-12771` | checkpoint-12771 | 400step/0.8 |
+| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-15867` | checkpoint-15867 | 800step/0.5 |
+| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-16254` | checkpoint-16254 | 500step/0.8 |
+| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-18963` | checkpoint-18963 | 954step/0.5 |
+| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-19350` | checkpoint-19350 | 600step/0.8 |
+| `id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-25542` | checkpoint-25542 | 800step/0.8 |
+## RL Checkpoints
+| Path | Nominal step | CPT epoch | Source CPT checkpoint | Uploaded checkpoint |
+| --- | --- | --- | --- | --- |
+| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-50step-0.8RL` | 50 | 0.2 | checkpoint-387 | `global_step_40` |
+| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.8-rl-op11-14_uniform-100step-0.2RL` | 100 | 0.8 | checkpoint-3096 | `global_step_19` |
+| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.5-rl-op11-14_uniform-100step-0.5RL` | 100 | 0.5 | checkpoint-1935 | `global_step_50` |
+| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-100step-0.8RL` | 100 | 0.2 | checkpoint-774 | `global_step_80` |
+| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-200step-0.8RL` | 200 | 0.2 | checkpoint-1548 | `global_step_160` |
+| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.8-rl-op11-14_uniform-400step-0.2RL` | 400 | 0.8 | checkpoint-12771 | not found locally |
+| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.5-rl-op11-14_uniform-400step-0.5RL` | 400 | 0.5 | checkpoint-8127 | not found locally |
+| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-400step-0.8RL` | 400 | 0.2 | checkpoint-3096 | not found locally |
+| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.8-rl-op11-14_uniform-500step-0.2RL` | 500 | 0.8 | checkpoint-16254 | not found locally |
+| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.5-rl-op11-14_uniform-500step-0.5RL` | 500 | 0.5 | checkpoint-10062 | not found locally |
+| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-500step-0.8RL` | 500 | 0.2 | checkpoint-3870 | not found locally |
+| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.8-rl-op11-14_uniform-600step-0.2RL` | 600 | 0.8 | checkpoint-19350 | not found locally |
+| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.5-rl-op11-14_uniform-600step-0.5RL` | 600 | 0.5 | checkpoint-11997 | not found locally |
+| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-600step-0.8RL` | 600 | 0.2 | checkpoint-4644 | not found locally |
+| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.8-rl-op11-14_uniform-800step-0.2RL` | 800 | 0.8 | checkpoint-25542 | not found locally |
+| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.5-rl-op11-14_uniform-800step-0.5RL` | 800 | 0.5 | checkpoint-15867 | not found locally |
+| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-800step-0.8RL` | 800 | 0.2 | checkpoint-6579 | not found locally |
+| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.5-rl-op11-14_uniform-954step-0.5RL` | 954 | 0.5 | checkpoint-18963 | not found locally |
+| `id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-954step-0.8RL` | 954 | 0.2 | checkpoint-7740 | not found locally |
+## Load
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+repo_id = "Interplay-LM-Reasoning/extrapolation_midtrain"
+subdir = "id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-25542"
+tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder=subdir)
+model = AutoModelForCausalLM.from_pretrained(repo_id, subfolder=subdir)
+```
+## Citation
 ```bibtex
 @misc{zhang2025interplaypretrainingmidtrainingrl,
+      title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
       author={Charlie Zhang and Graham Neubig and Xiang Yue},
       year={2025},
       eprint={2512.07783},
       archivePrefix={arXiv},
       primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2512.07783},
 }
+```

id2-10_0.2easy_0.3medium_0.5hard/README.md ADDED Viewed

	@@ -0,0 +1,28 @@

+---
+license: other
+library_name: transformers
+tags:
+- reasoning
+- mid-training
+- extrapolation
+- synthetic-data
+- transformers
+---
+# id2-10_0.2easy_0.3medium_0.5hard
+`op11-14` extrapolation mid-training artifacts for `cpt0.2-uniform_0.8-11-14_plus`.
+## Citation
+```bibtex
+@misc{zhang2025interplaypretrainingmidtrainingrl,
+      title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
+      author={Charlie Zhang and Graham Neubig and Xiang Yue},
+      year={2025},
+      eprint={2512.07783},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2512.07783},
+}
+```

id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-10062/README.md ADDED Viewed

	@@ -0,0 +1,28 @@

+---
+license: other
+library_name: transformers
+tags:
+- reasoning
+- mid-training
+- extrapolation
+- synthetic-data
+- transformers
+---
+# id2-10_0.2easy_0.3medium_0.5hard / midtrain / cpt0.2-uniform_0.8-11-14_plus / checkpoint-10062
+CPT checkpoint `checkpoint-10062` from `cpt0.2-uniform_0.8-11-14_plus`, used by the op11-14 CPT-to-RL budget scripts.
+## Citation
+```bibtex
+@misc{zhang2025interplaypretrainingmidtrainingrl,
+      title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
+      author={Charlie Zhang and Graham Neubig and Xiang Yue},
+      year={2025},
+      eprint={2512.07783},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2512.07783},
+}
+```

id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-11997/README.md ADDED Viewed

	@@ -0,0 +1,28 @@

+---
+license: other
+library_name: transformers
+tags:
+- reasoning
+- mid-training
+- extrapolation
+- synthetic-data
+- transformers
+---
+# id2-10_0.2easy_0.3medium_0.5hard / midtrain / cpt0.2-uniform_0.8-11-14_plus / checkpoint-11997
+CPT checkpoint `checkpoint-11997` from `cpt0.2-uniform_0.8-11-14_plus`, used by the op11-14 CPT-to-RL budget scripts.
+## Citation
+```bibtex
+@misc{zhang2025interplaypretrainingmidtrainingrl,
+      title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
+      author={Charlie Zhang and Graham Neubig and Xiang Yue},
+      year={2025},
+      eprint={2512.07783},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2512.07783},
+}
+```

id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-12771/README.md ADDED Viewed

	@@ -0,0 +1,28 @@

+---
+license: other
+library_name: transformers
+tags:
+- reasoning
+- mid-training
+- extrapolation
+- synthetic-data
+- transformers
+---
+# id2-10_0.2easy_0.3medium_0.5hard / midtrain / cpt0.2-uniform_0.8-11-14_plus / checkpoint-12771
+CPT checkpoint `checkpoint-12771` from `cpt0.2-uniform_0.8-11-14_plus`, used by the op11-14 CPT-to-RL budget scripts.
+## Citation
+```bibtex
+@misc{zhang2025interplaypretrainingmidtrainingrl,
+      title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
+      author={Charlie Zhang and Graham Neubig and Xiang Yue},
+      year={2025},
+      eprint={2512.07783},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2512.07783},
+}
+```

id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-1548/README.md ADDED Viewed

	@@ -0,0 +1,28 @@

+---
+license: other
+library_name: transformers
+tags:
+- reasoning
+- mid-training
+- extrapolation
+- synthetic-data
+- transformers
+---
+# id2-10_0.2easy_0.3medium_0.5hard / midtrain / cpt0.2-uniform_0.8-11-14_plus / checkpoint-1548
+CPT checkpoint `checkpoint-1548` from `cpt0.2-uniform_0.8-11-14_plus`, used by the op11-14 CPT-to-RL budget scripts.
+## Citation
+```bibtex
+@misc{zhang2025interplaypretrainingmidtrainingrl,
+      title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
+      author={Charlie Zhang and Graham Neubig and Xiang Yue},
+      year={2025},
+      eprint={2512.07783},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2512.07783},
+}
+```

id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-15867/README.md ADDED Viewed

	@@ -0,0 +1,28 @@

+---
+license: other
+library_name: transformers
+tags:
+- reasoning
+- mid-training
+- extrapolation
+- synthetic-data
+- transformers
+---
+# id2-10_0.2easy_0.3medium_0.5hard / midtrain / cpt0.2-uniform_0.8-11-14_plus / checkpoint-15867
+CPT checkpoint `checkpoint-15867` from `cpt0.2-uniform_0.8-11-14_plus`, used by the op11-14 CPT-to-RL budget scripts.
+## Citation
+```bibtex
+@misc{zhang2025interplaypretrainingmidtrainingrl,
+      title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
+      author={Charlie Zhang and Graham Neubig and Xiang Yue},
+      year={2025},
+      eprint={2512.07783},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2512.07783},
+}
+```

id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-16254/README.md ADDED Viewed

	@@ -0,0 +1,28 @@

+---
+license: other
+library_name: transformers
+tags:
+- reasoning
+- mid-training
+- extrapolation
+- synthetic-data
+- transformers
+---
+# id2-10_0.2easy_0.3medium_0.5hard / midtrain / cpt0.2-uniform_0.8-11-14_plus / checkpoint-16254
+CPT checkpoint `checkpoint-16254` from `cpt0.2-uniform_0.8-11-14_plus`, used by the op11-14 CPT-to-RL budget scripts.
+## Citation
+```bibtex
+@misc{zhang2025interplaypretrainingmidtrainingrl,
+      title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
+      author={Charlie Zhang and Graham Neubig and Xiang Yue},
+      year={2025},
+      eprint={2512.07783},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2512.07783},
+}
+```

id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-18963/README.md ADDED Viewed

	@@ -0,0 +1,28 @@

+---
+license: other
+library_name: transformers
+tags:
+- reasoning
+- mid-training
+- extrapolation
+- synthetic-data
+- transformers
+---
+# id2-10_0.2easy_0.3medium_0.5hard / midtrain / cpt0.2-uniform_0.8-11-14_plus / checkpoint-18963
+CPT checkpoint `checkpoint-18963` from `cpt0.2-uniform_0.8-11-14_plus`, used by the op11-14 CPT-to-RL budget scripts.
+## Citation
+```bibtex
+@misc{zhang2025interplaypretrainingmidtrainingrl,
+      title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
+      author={Charlie Zhang and Graham Neubig and Xiang Yue},
+      year={2025},
+      eprint={2512.07783},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2512.07783},
+}
+```

id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-1935/README.md ADDED Viewed

	@@ -0,0 +1,28 @@

+---
+license: other
+library_name: transformers
+tags:
+- reasoning
+- mid-training
+- extrapolation
+- synthetic-data
+- transformers
+---
+# id2-10_0.2easy_0.3medium_0.5hard / midtrain / cpt0.2-uniform_0.8-11-14_plus / checkpoint-1935
+CPT checkpoint `checkpoint-1935` from `cpt0.2-uniform_0.8-11-14_plus`, used by the op11-14 CPT-to-RL budget scripts.
+## Citation
+```bibtex
+@misc{zhang2025interplaypretrainingmidtrainingrl,
+      title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
+      author={Charlie Zhang and Graham Neubig and Xiang Yue},
+      year={2025},
+      eprint={2512.07783},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2512.07783},
+}
+```

id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-19350/README.md ADDED Viewed

	@@ -0,0 +1,28 @@

+---
+license: other
+library_name: transformers
+tags:
+- reasoning
+- mid-training
+- extrapolation
+- synthetic-data
+- transformers
+---
+# id2-10_0.2easy_0.3medium_0.5hard / midtrain / cpt0.2-uniform_0.8-11-14_plus / checkpoint-19350
+CPT checkpoint `checkpoint-19350` from `cpt0.2-uniform_0.8-11-14_plus`, used by the op11-14 CPT-to-RL budget scripts.
+## Citation
+```bibtex
+@misc{zhang2025interplaypretrainingmidtrainingrl,
+      title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
+      author={Charlie Zhang and Graham Neubig and Xiang Yue},
+      year={2025},
+      eprint={2512.07783},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2512.07783},
+}
+```

id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-25542/README.md ADDED Viewed

	@@ -0,0 +1,28 @@

+---
+license: other
+library_name: transformers
+tags:
+- reasoning
+- mid-training
+- extrapolation
+- synthetic-data
+- transformers
+---
+# id2-10_0.2easy_0.3medium_0.5hard / midtrain / cpt0.2-uniform_0.8-11-14_plus / checkpoint-25542
+CPT checkpoint `checkpoint-25542` from `cpt0.2-uniform_0.8-11-14_plus`, used by the op11-14 CPT-to-RL budget scripts.
+## Citation
+```bibtex
+@misc{zhang2025interplaypretrainingmidtrainingrl,
+      title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
+      author={Charlie Zhang and Graham Neubig and Xiang Yue},
+      year={2025},
+      eprint={2512.07783},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2512.07783},
+}
+```

id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-3096/README.md ADDED Viewed

	@@ -0,0 +1,28 @@

+---
+license: other
+library_name: transformers
+tags:
+- reasoning
+- mid-training
+- extrapolation
+- synthetic-data
+- transformers
+---
+# id2-10_0.2easy_0.3medium_0.5hard / midtrain / cpt0.2-uniform_0.8-11-14_plus / checkpoint-3096
+CPT checkpoint `checkpoint-3096` from `cpt0.2-uniform_0.8-11-14_plus`, used by the op11-14 CPT-to-RL budget scripts.
+## Citation
+```bibtex
+@misc{zhang2025interplaypretrainingmidtrainingrl,
+      title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
+      author={Charlie Zhang and Graham Neubig and Xiang Yue},
+      year={2025},
+      eprint={2512.07783},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2512.07783},
+}
+```

id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-387/README.md ADDED Viewed

	@@ -0,0 +1,28 @@

+---
+license: other
+library_name: transformers
+tags:
+- reasoning
+- mid-training
+- extrapolation
+- synthetic-data
+- transformers
+---
+# id2-10_0.2easy_0.3medium_0.5hard / midtrain / cpt0.2-uniform_0.8-11-14_plus / checkpoint-387
+CPT checkpoint `checkpoint-387` from `cpt0.2-uniform_0.8-11-14_plus`, used by the op11-14 CPT-to-RL budget scripts.
+## Citation
+```bibtex
+@misc{zhang2025interplaypretrainingmidtrainingrl,
+      title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
+      author={Charlie Zhang and Graham Neubig and Xiang Yue},
+      year={2025},
+      eprint={2512.07783},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2512.07783},
+}
+```

id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-3870/README.md ADDED Viewed

	@@ -0,0 +1,28 @@

+---
+license: other
+library_name: transformers
+tags:
+- reasoning
+- mid-training
+- extrapolation
+- synthetic-data
+- transformers
+---
+# id2-10_0.2easy_0.3medium_0.5hard / midtrain / cpt0.2-uniform_0.8-11-14_plus / checkpoint-3870
+CPT checkpoint `checkpoint-3870` from `cpt0.2-uniform_0.8-11-14_plus`, used by the op11-14 CPT-to-RL budget scripts.
+## Citation
+```bibtex
+@misc{zhang2025interplaypretrainingmidtrainingrl,
+      title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
+      author={Charlie Zhang and Graham Neubig and Xiang Yue},
+      year={2025},
+      eprint={2512.07783},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2512.07783},
+}
+```

id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-4644/README.md ADDED Viewed

	@@ -0,0 +1,28 @@

+---
+license: other
+library_name: transformers
+tags:
+- reasoning
+- mid-training
+- extrapolation
+- synthetic-data
+- transformers
+---
+# id2-10_0.2easy_0.3medium_0.5hard / midtrain / cpt0.2-uniform_0.8-11-14_plus / checkpoint-4644
+CPT checkpoint `checkpoint-4644` from `cpt0.2-uniform_0.8-11-14_plus`, used by the op11-14 CPT-to-RL budget scripts.
+## Citation
+```bibtex
+@misc{zhang2025interplaypretrainingmidtrainingrl,
+      title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
+      author={Charlie Zhang and Graham Neubig and Xiang Yue},
+      year={2025},
+      eprint={2512.07783},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2512.07783},
+}
+```

id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-6579/README.md ADDED Viewed

	@@ -0,0 +1,28 @@

+---
+license: other
+library_name: transformers
+tags:
+- reasoning
+- mid-training
+- extrapolation
+- synthetic-data
+- transformers
+---
+# id2-10_0.2easy_0.3medium_0.5hard / midtrain / cpt0.2-uniform_0.8-11-14_plus / checkpoint-6579
+CPT checkpoint `checkpoint-6579` from `cpt0.2-uniform_0.8-11-14_plus`, used by the op11-14 CPT-to-RL budget scripts.
+## Citation
+```bibtex
+@misc{zhang2025interplaypretrainingmidtrainingrl,
+      title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
+      author={Charlie Zhang and Graham Neubig and Xiang Yue},
+      year={2025},
+      eprint={2512.07783},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2512.07783},
+}
+```

id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-774/README.md ADDED Viewed

	@@ -0,0 +1,28 @@

+---
+license: other
+library_name: transformers
+tags:
+- reasoning
+- mid-training
+- extrapolation
+- synthetic-data
+- transformers
+---
+# id2-10_0.2easy_0.3medium_0.5hard / midtrain / cpt0.2-uniform_0.8-11-14_plus / checkpoint-774
+CPT checkpoint `checkpoint-774` from `cpt0.2-uniform_0.8-11-14_plus`, used by the op11-14 CPT-to-RL budget scripts.
+## Citation
+```bibtex
+@misc{zhang2025interplaypretrainingmidtrainingrl,
+      title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
+      author={Charlie Zhang and Graham Neubig and Xiang Yue},
+      year={2025},
+      eprint={2512.07783},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2512.07783},
+}
+```

id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-7740/README.md ADDED Viewed

	@@ -0,0 +1,28 @@

+---
+license: other
+library_name: transformers
+tags:
+- reasoning
+- mid-training
+- extrapolation
+- synthetic-data
+- transformers
+---
+# id2-10_0.2easy_0.3medium_0.5hard / midtrain / cpt0.2-uniform_0.8-11-14_plus / checkpoint-7740
+CPT checkpoint `checkpoint-7740` from `cpt0.2-uniform_0.8-11-14_plus`, used by the op11-14 CPT-to-RL budget scripts.
+## Citation
+```bibtex
+@misc{zhang2025interplaypretrainingmidtrainingrl,
+      title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
+      author={Charlie Zhang and Graham Neubig and Xiang Yue},
+      year={2025},
+      eprint={2512.07783},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2512.07783},
+}
+```

id2-10_0.2easy_0.3medium_0.5hard/midtrain/cpt0.2-uniform_0.8-11-14_plus/checkpoint-8127/README.md ADDED Viewed

	@@ -0,0 +1,28 @@

+---
+license: other
+library_name: transformers
+tags:
+- reasoning
+- mid-training
+- extrapolation
+- synthetic-data
+- transformers
+---
+# id2-10_0.2easy_0.3medium_0.5hard / midtrain / cpt0.2-uniform_0.8-11-14_plus / checkpoint-8127
+CPT checkpoint `checkpoint-8127` from `cpt0.2-uniform_0.8-11-14_plus`, used by the op11-14 CPT-to-RL budget scripts.
+## Citation
+```bibtex
+@misc{zhang2025interplaypretrainingmidtrainingrl,
+      title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
+      author={Charlie Zhang and Graham Neubig and Xiang Yue},
+      year={2025},
+      eprint={2512.07783},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2512.07783},
+}
+```

id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-100step-0.8RL/README.md ADDED Viewed

	@@ -0,0 +1,28 @@

+---
+license: other
+library_name: transformers
+tags:
+- reasoning
+- mid-training
+- extrapolation
+- synthetic-data
+- transformers
+---
+# id2-10_0.2easy_0.3medium_0.5hard / rl / cpt0.2-rl-op11-14_uniform-100step-0.8RL
+Final RL checkpoint from `cpt0.2-rl-op11-14_uniform-100step-0.8RL`, uploaded from `global_step_80`. This run uses CPT `checkpoint-774` with nominal step `100` and CPT epoch `0.2`.
+## Citation
+```bibtex
+@misc{zhang2025interplaypretrainingmidtrainingrl,
+      title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
+      author={Charlie Zhang and Graham Neubig and Xiang Yue},
+      year={2025},
+      eprint={2512.07783},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2512.07783},
+}
+```

id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-200step-0.8RL/README.md ADDED Viewed

	@@ -0,0 +1,28 @@

+---
+license: other
+library_name: transformers
+tags:
+- reasoning
+- mid-training
+- extrapolation
+- synthetic-data
+- transformers
+---
+# id2-10_0.2easy_0.3medium_0.5hard / rl / cpt0.2-rl-op11-14_uniform-200step-0.8RL
+Final RL checkpoint from `cpt0.2-rl-op11-14_uniform-200step-0.8RL`, uploaded from `global_step_160`. This run uses CPT `checkpoint-1548` with nominal step `200` and CPT epoch `0.2`.
+## Citation
+```bibtex
+@misc{zhang2025interplaypretrainingmidtrainingrl,
+      title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
+      author={Charlie Zhang and Graham Neubig and Xiang Yue},
+      year={2025},
+      eprint={2512.07783},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2512.07783},
+}
+```

id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.2-rl-op11-14_uniform-50step-0.8RL/README.md ADDED Viewed

	@@ -0,0 +1,28 @@

+---
+license: other
+library_name: transformers
+tags:
+- reasoning
+- mid-training
+- extrapolation
+- synthetic-data
+- transformers
+---
+# id2-10_0.2easy_0.3medium_0.5hard / rl / cpt0.2-rl-op11-14_uniform-50step-0.8RL
+Final RL checkpoint from `cpt0.2-rl-op11-14_uniform-50step-0.8RL`, uploaded from `global_step_40`. This run uses CPT `checkpoint-387` with nominal step `50` and CPT epoch `0.2`.
+## Citation
+```bibtex
+@misc{zhang2025interplaypretrainingmidtrainingrl,
+      title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
+      author={Charlie Zhang and Graham Neubig and Xiang Yue},
+      year={2025},
+      eprint={2512.07783},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2512.07783},
+}
+```

id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.5-rl-op11-14_uniform-100step-0.5RL/README.md ADDED Viewed

	@@ -0,0 +1,28 @@

+---
+license: other
+library_name: transformers
+tags:
+- reasoning
+- mid-training
+- extrapolation
+- synthetic-data
+- transformers
+---
+# id2-10_0.2easy_0.3medium_0.5hard / rl / cpt0.5-rl-op11-14_uniform-100step-0.5RL
+Final RL checkpoint from `cpt0.5-rl-op11-14_uniform-100step-0.5RL`, uploaded from `global_step_50`. This run uses CPT `checkpoint-1935` with nominal step `100` and CPT epoch `0.5`.
+## Citation
+```bibtex
+@misc{zhang2025interplaypretrainingmidtrainingrl,
+      title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
+      author={Charlie Zhang and Graham Neubig and Xiang Yue},
+      year={2025},
+      eprint={2512.07783},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2512.07783},
+}
+```

id2-10_0.2easy_0.3medium_0.5hard/rl/cpt0.8-rl-op11-14_uniform-100step-0.2RL/README.md ADDED Viewed

	@@ -0,0 +1,28 @@

+---
+license: other
+library_name: transformers
+tags:
+- reasoning
+- mid-training
+- extrapolation
+- synthetic-data
+- transformers
+---
+# id2-10_0.2easy_0.3medium_0.5hard / rl / cpt0.8-rl-op11-14_uniform-100step-0.2RL
+Final RL checkpoint from `cpt0.8-rl-op11-14_uniform-100step-0.2RL`, uploaded from `global_step_19`. This run uses CPT `checkpoint-3096` with nominal step `100` and CPT epoch `0.8`.
+## Citation
+```bibtex
+@misc{zhang2025interplaypretrainingmidtrainingrl,
+      title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
+      author={Charlie Zhang and Graham Neubig and Xiang Yue},
+      year={2025},
+      eprint={2512.07783},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2512.07783},
+}
+```