xx18 nielsr HF Staff commited on
Commit
3e5e097
·
1 Parent(s): 6d03aca

Add model card and metadata (#1)

Browse files

- Add model card and metadata (8b9254cd82029e65a65d4e3c8fbd3b63b771ddfe)


Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +29 -0
README.md ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ pipeline_tag: text-generation
4
+ ---
5
+
6
+ # Composition-RL-8B
7
+
8
+ This repository contains the 8B model checkpoint for **Composition-RL**, introduced in the paper [Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models](https://huggingface.co/papers/2602.12036).
9
+
10
+ ## Overview
11
+ Composition-RL is a data-efficient Reinforcement Learning with Verifiable Rewards (RLVR) approach. It addresses the challenge of "too-easy" prompts (where the pass rate reaches 1) by automatically composing multiple verifiable problems into a single, harder yet still-verifiable prompt. This ensures that RL training continues to receive informative signals as the model's reasoning capabilities improve.
12
+
13
+ ## Model Details
14
+ - **Base Model:** Qwen3-8B-Base
15
+ - **Training Dataset:** MATH-Composition-199K
16
+ - **Framework:** Composition-RL
17
+ - **Paper:** [Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models](https://huggingface.co/papers/2602.12036)
18
+ - **Code:** [GitHub - XinXU-USTC/Composition-RL](https://github.com/XinXU-USTC/Composition-RL)
19
+
20
+ ## Citation
21
+ If you find this work helpful for your research, please consider citing:
22
+ ```bibtex
23
+ @article{xu2026composition-rl,
24
+ title={Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models},
25
+ author={Xu, Xin and Bai, Clive and Yang, Kai and Chen, Tianhao and Chen, Yangkun and Liu, Weijie and Chen, Hao and Wang, Yang and Yang, Saiyong and Yang, Can},
26
+ journal={arXiv preprint arXiv:2602.12036},
27
+ year={2026}
28
+ }
29
+ ```