Clockz commited on
Commit
7ce1557
·
verified ·
1 Parent(s): 00df465

Add files using upload-large-folder tool

Browse files
Files changed (1) hide show
  1. README.md +28 -31
README.md CHANGED
@@ -1,46 +1,43 @@
1
  ---
2
- license: mit
3
- pipeline_tag: text-generation
 
 
 
 
 
4
  ---
5
 
6
- <h1 align="center">
7
- On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models
8
- </h1>
9
 
10
- <div align="center">
11
 
12
- <a href="https://chenlong-clock.github.io">Charlie Zhang</a>, <a href="https://www.phontron.com">Graham Neubig</a>,
13
- <a href="https://xiangyue9607.github.io">Xiang Yue</a>
14
 
15
- Carnegie Mellon University, Language Technologies Institute
 
16
 
17
- </div>
18
 
19
- <div align="center">
20
 
21
- [![arXiv](https://img.shields.io/badge/arXiv-2512.07783-b31b1b.svg?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2512.07783)
22
- [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
23
- ![Python](https://img.shields.io/badge/python-3.9%2B-blue)
 
24
 
25
- </div>
26
 
 
 
27
 
28
- This repository contains post-training related checkpoints in extrapolation tasks.
 
29
 
30
- **Code:** [GitHub Repository](https://github.com/Interplay-LM-Reasoning/Interplay-LM-Reasoning)
 
 
31
 
32
- ## 📚 Citation
33
 
34
- If you find this work or code useful, please consider citing:
35
-
36
- ```bibtex
37
- @misc{zhang2025interplaypretrainingmidtrainingrl,
38
- title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
39
- author={Charlie Zhang and Graham Neubig and Xiang Yue},
40
- year={2025},
41
- eprint={2512.07783},
42
- archivePrefix={arXiv},
43
- primaryClass={cs.CL},
44
- url={https://arxiv.org/abs/2512.07783},
45
- }
46
- ```
 
1
  ---
2
+ license: other
3
+ library_name: transformers
4
+ tags:
5
+ - reasoning
6
+ - extrapolation
7
+ - synthetic-data
8
+ - transformers
9
  ---
10
 
11
+ # Interplay-LM Extrapolation RL Models
 
 
12
 
13
+ This repository is organized by experiment setting. Each top-level directory corresponds to one pretraining mixture used in the extrapolation experiments.
14
 
15
+ Within each setting:
 
16
 
17
+ - `base/` stores the base model used to initialize RL.
18
+ - `rl/` stores the final RL checkpoints for each experiment variant.
19
 
20
+ Only inference-relevant Hugging Face files are included.
21
 
22
+ ## Included settings
23
 
24
+ - `id2-10_0.2easy_0.3medium_0.5hard`
25
+ - `id2-10_0.5easy_0.3medium_0.2hard`
26
+ - `id2-10_0.4995easy_0.4995medium_0.001hard`
27
+ - `id2-10_0.475easy_0.475medium_0.05hard`
28
 
29
+ ## Load
30
 
31
+ ```python
32
+ from transformers import AutoModelForCausalLM, AutoTokenizer
33
 
34
+ repo_id = "Interplay-LM-Reasoning/extrapolation_rl"
35
+ subdir = "id2-10_0.5easy_0.3medium_0.2hard/rl/op11-14_uniform"
36
 
37
+ tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder=subdir)
38
+ model = AutoModelForCausalLM.from_pretrained(repo_id, subfolder=subdir)
39
+ ```
40
 
41
+ ## Reference
42
 
43
+ - Zhang, Charlie; Neubig, Graham; Yue, Xiang. "On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models." arXiv:2512.07783 (2025).