Interplay-LM-Reasoning
/

extrapolation_rl

@@ -1,46 +1,43 @@
 ---
-license: mit
-pipeline_tag: text-generation
 ---
-<h1 align="center">
-On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models
-</h1>
-<div align="center">
-<a href="https://chenlong-clock.github.io">Charlie Zhang</a>, <a href="https://www.phontron.com">Graham Neubig</a>,
-<a href="https://xiangyue9607.github.io">Xiang Yue</a>
-Carnegie Mellon University, Language Technologies Institute
-</div>
-<div align="center">
-[![arXiv](https://img.shields.io/badge/arXiv-2512.07783-b31b1b.svg?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2512.07783)
-[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
-![Python](https://img.shields.io/badge/python-3.9%2B-blue)
-</div>
-This repository contains post-training related checkpoints in extrapolation tasks.
-**Code:** [GitHub Repository](https://github.com/Interplay-LM-Reasoning/Interplay-LM-Reasoning)
-## 📚 Citation
-If you find this work or code useful, please consider citing:
-```bibtex
-@misc{zhang2025interplaypretrainingmidtrainingrl,
-      title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
-      author={Charlie Zhang and Graham Neubig and Xiang Yue},
-      year={2025},
-      eprint={2512.07783},
-      archivePrefix={arXiv},
-      primaryClass={cs.CL},
-      url={https://arxiv.org/abs/2512.07783},
-}
-```

 ---
+license: other
+library_name: transformers
+tags:
+- reasoning
+- extrapolation
+- synthetic-data
+- transformers
 ---
+# Interplay-LM Extrapolation RL Models
+This repository is organized by experiment setting. Each top-level directory corresponds to one pretraining mixture used in the extrapolation experiments.
+Within each setting:
+- `base/` stores the base model used to initialize RL.
+- `rl/` stores the final RL checkpoints for each experiment variant.
+Only inference-relevant Hugging Face files are included.
+## Included settings
+- `id2-10_0.2easy_0.3medium_0.5hard`
+- `id2-10_0.5easy_0.3medium_0.2hard`
+- `id2-10_0.4995easy_0.4995medium_0.001hard`
+- `id2-10_0.475easy_0.475medium_0.05hard`
+## Load
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+repo_id = "Interplay-LM-Reasoning/extrapolation_rl"
+subdir = "id2-10_0.5easy_0.3medium_0.2hard/rl/op11-14_uniform"
+tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder=subdir)
+model = AutoModelForCausalLM.from_pretrained(repo_id, subfolder=subdir)
+```
+## Reference
+- Zhang, Charlie; Neubig, Graham; Yue, Xiang. "On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models." arXiv:2512.07783 (2025).