| --- |
| license: other |
| library_name: transformers |
| tags: |
| - reasoning |
| - context-learning |
| - pretraining |
| - synthetic-data |
| - transformers |
| --- |
| |
| # Interplay-LM Context Pretrain 2 |
|
|
| This repository contains the context B pretraining checkpoints and the corresponding final RL checkpoints. In this setting, the teacher component uses only op2 during pretraining. |
|
|
| Only inference-relevant Hugging Face files are included. |
|
|
| Within each setting: |
|
|
| - `base/` stores the final op2-only pretraining checkpoint. |
| - `rl/` stores the final RL checkpoints for each experiment variant. |
|
|
| ## Included settings |
|
|
| - `0.9zoo_op2-20+0.1teacher_op2` |
| - `0.99zoo_op2-20+0.01teacher_op2` |
| - `0.999zoo_op2-20+0.001teacher_op2` |
|
|
| ## Load |
|
|
| ```python |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| |
| repo_id = "Interplay-LM-Reasoning/context_pretrain_2" |
| subdir = "0.99zoo_op2-20+0.01teacher_op2/rl/contextzoo_0.99zoo_0.01teacher_process_strict" |
| |
| tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder=subdir) |
| model = AutoModelForCausalLM.from_pretrained(repo_id, subfolder=subdir) |
| ``` |
|
|
| ## Citation |
|
|
| ```bibtex |
| @misc{zhang2025interplaypretrainingmidtrainingrl, |
| title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models}, |
| author={Charlie Zhang and Graham Neubig and Xiang Yue}, |
| year={2025}, |
| eprint={2512.07783}, |
| archivePrefix={arXiv}, |
| primaryClass={cs.CL}, |
| url={https://arxiv.org/abs/2512.07783}, |
| } |
| ``` |
|
|