Composition-RL-8B

This repository contains the 8B model checkpoint for Composition-RL, introduced in the paper Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models.

Overview

Composition-RL is a data-efficient Reinforcement Learning with Verifiable Rewards (RLVR) approach. It addresses the challenge of "too-easy" prompts (where the pass rate reaches 1) by automatically composing multiple verifiable problems into a single, harder yet still-verifiable prompt. This ensures that RL training continues to receive informative signals as the model's reasoning capabilities improve.

Model Details

Citation

If you find this work helpful for your research, please consider citing:

@article{xu2026composition-rl,
  title={Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models},
  author={Xu, Xin and Bai, Clive and Yang, Kai and Chen, Tianhao and Chen, Yangkun and Liu, Weijie and Chen, Hao and Wang, Yang and Yang, Saiyong and Yang, Can},
  journal={arXiv preprint arXiv:2602.12036},
  year={2026}
}
Downloads last month
21
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including xx18/Composition-RL-4B-Depth1_2_3

Paper for xx18/Composition-RL-4B-Depth1_2_3