Miaow-Lab
/

RLVR-Linearity-Checkpoints

Text Generation

Model card Files Files and versions

louiswng commited on Jan 26

Commit

d143c06

·

verified ·

1 Parent(s): 452b1c2

Update README.md

Files changed (1) hide show

README.md +53 -3

README.md CHANGED Viewed

@@ -1,3 +1,53 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+datasets:
+- Miaow-Lab/RLVR-Linearity-Dataset
+base_model:
+- deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
+pipeline_tag: text-generation
+---
+# Model Card
+## 1. Model Details
+This model is the fine-tuned checkpoint described in the paper **"Not All Steps are Informative: On the Linearity of LLMs’ RLVR Training"**. It was trained using Reinforcement Learning (GRPO) to enhance mathematical reasoning capabilities.
+- **Paper:** [ArXiv](https://arxiv.org/pdf/2601.04537v1)
+- **Code:** [Github](https://github.com/Miaow-Lab/RLVR-Linearity)
+- **Base Model:** [deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B)
+- **Training Method:** GRPO
+## 2. Performance
+We evaluated the model on standard math benchmarks. Key results include:
+| Benchmark | Avg@64 |
+| :--- | :--- |
+| AIME 2024 | **41.93%** |
+## 3. Training Details
+- **Hyperparameters:**
+  - Learning Rate: `1e-6`
+  - Train Batch Size: `128`
+  - PPO Mini Batch Size: `64`
+  - RL Algorithm: `GRPO`
+- **Compute:** Trained on `32 x H100` GPUs for about `150` hours.
+For full training configurations, please refer to the `config.json` or the training scripts in our [GitHub](https://github.com/Miaow-Lab/RLVR-Linearity).
+## 4. Citation
+If you use this model in your research, please cite our paper:
+```bibtex
+@misc{wang2026stepsinformativelinearityllms,
+      title={Not All Steps are Informative: On the Linearity of LLMs' RLVR Training},
+      author={Tianle Wang and Zhongyuan Wu and Shenghao Jin and Hao Xu and Wei Chen and Ning Miao},
+      year={2026},
+      eprint={2601.04537},
+      archivePrefix={arXiv},
+      primaryClass={cs.LG},
+      url={https://arxiv.org/abs/2601.04537},
+}
+```