Add model card and metadata

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +33 -0
README.md ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ pipeline_tag: text-generation
4
+ ---
5
+
6
+ # Composition-RL-8B
7
+
8
+ Composition-RL is a data-efficient Reinforcement Learning with Verifiable Rewards (RLVR) approach that addresses the scarcity of informative training signals by automatically composing multiple verifiable problems into a single, harder compositional prompt.
9
+
10
+ This specific checkpoint is the 8B version, initialized from **Qwen3-8B-Base** and trained on the `MATH-Composition-199K` dataset.
11
+
12
+ ## Model Description
13
+ As training progresses in RLVR, models often master "easy" prompts, resulting in a pass rate of 1 and reducing effective learning. Composition-RL mitigates this by creating new, complex, yet verifiable questions from existing data, maintaining a high level of difficulty and informative signals throughout training.
14
+
15
+ - **Developed by:** Xin Xu, Clive Bai, Kai Yang, Tianhao Chen, Yangkun Chen, Weijie Liu, Hao Chen, Yang Wang, Saiyong Yang, and Can Yang.
16
+ - **Paper:** [Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models](https://huggingface.co/papers/2602.12036)
17
+ - **Repository:** [GitHub - Composition-RL](https://github.com/XinXU-USTC/Composition-RL)
18
+ - **Base Model:** Qwen3-8B-Base
19
+
20
+ ## Usage
21
+ For evaluation and data generation instructions, please refer to the official [GitHub repository](https://github.com/XinXU-USTC/Composition-RL).
22
+
23
+ ## Citation
24
+ If you find this work helpful for your research, please consider citing:
25
+
26
+ ```bibtex
27
+ @article{xu2026composition-rl,
28
+ title={Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models},
29
+ author={Xu, Xin and Bai, Clive and Yang, Kai Rural and Chen, Tianhao and Chen, Yangkun and Liu, Weijie and Chen, Hao and Wang, Yang and Yang, Saiyong and Yang, Can},
30
+ journal={arXiv preprint arXiv:2602.12036},
31
+ year={2026}
32
+ }
33
+ ```