Interplay-LM-Reasoning
/

extrapolation_rl

Text Generation

Model card Files Files and versions

extrapolation_rl / README.md

Clockz's picture

Update README.md

22ab097 verified 3 months ago

|

1.24 kB

	---
	license: mit
	---

	<h1 align="center">
	On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models
	</h1>

	<div align="center">

	<a href="https://chenlong-clock.github.io">Charlie Zhang</a>, <a href="https://www.phontron.com">Graham Neubig</a>,
	<a href="https://xiangyue9607.github.io">Xiang Yue</a>

	Carnegie Mellon University, Language Technologies Institute

	</div>

	<div align="center">

	[![arXiv](https://img.shields.io/badge/arXiv-2512.07783-b31b1b.svg?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2512.07783)
	[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
	![Python](https://img.shields.io/badge/python-3.9%2B-blue)

	</div>


	This repository contains post-training related checkpoints in extrapolation tasks.

	## 📚 Citation

	If you find this work or code useful, please consider citing:

	```bibtex
	@misc{zhang2025interplaypretrainingmidtrainingrl,
	title={On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models},
	author={Charlie Zhang and Graham Neubig and Xiang Yue},
	year={2025},
	eprint={2512.07783},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2512.07783},
	}
	```