Interplay-LM-Reasoning
/

context_pretrain_2

+---
+license: other
+library_name: transformers
+tags:
+- reasoning
+- context-learning
+- pretraining
+- synthetic-data
+- transformers
+---
+# Interplay-LM Context Pretrain 2
+This repository contains the context B pretraining checkpoints and the corresponding final RL checkpoints. In this setting, the teacher component uses only op2 during pretraining.
+Only inference-relevant Hugging Face files are included.
+Within each setting:
+- `base/` stores the final op2-only pretraining checkpoint.
+- `rl/` stores the final RL checkpoints for each experiment variant.
+## Included settings
+- `0.9zoo_op2-20+0.1teacher_op2`
+- `0.99zoo_op2-20+0.01teacher_op2`
+- `0.999zoo_op2-20+0.001teacher_op2`
+## Load
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+repo_id = "Interplay-LM-Reasoning/context_pretrain_2"
+subdir = "0.99zoo_op2-20+0.01teacher_op2/rl/contextzoo_0.99zoo_0.01teacher_process_strict"
+tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder=subdir)
+model = AutoModelForCausalLM.from_pretrained(repo_id, subfolder=subdir)
+```
+## Citation
+```bibtex
+@misc{zhang2025interplaypretrainingmidtrainingrl,
+      title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
+      author={Charlie Zhang and Graham Neubig and Xiang Yue},
+      year={2025},
+      eprint={2512.07783},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2512.07783},
+}
+```