Miaow-Lab
/

RLVR-Linearity-Checkpoints

Add library_name and improve model card metadata

by nielsr HF Staff - opened Jan 27

←

Files changed (1) hide show

README.md CHANGED Viewed

@@ -1,21 +1,26 @@
 ---
-license: apache-2.0
-datasets:
-- Miaow-Lab/RLVR-Linearity-Dataset
 base_model:
 - deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
 pipeline_tag: text-generation
 ---
 # Model Card
 ## 1. Model Details
-This model is the fine-tuned checkpoint described in the paper **"Not All Steps are Informative: On the Linearity of LLMs’ RLVR Training"**. It was trained using Reinforcement Learning (RL) to enhance reasoning capabilities.
-- **Paper:** [ArXiv](https://arxiv.org/pdf/2601.04537v1)
-- **Code:** [Github](https://github.com/Miaow-Lab/RLVR-Linearity)
 - **Base Model:** [deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B)
-- **Training Method:** GRPO
 ## 2. Training Details
@@ -29,7 +34,7 @@ This model is the fine-tuned checkpoint described in the paper **"Not All Steps
   - Group Size: 16
 - **Compute:** Trained on `32 x H100` GPUs for about `150` hours.
-For full training configurations, please refer to the `config.json` or the training scripts in our [GitHub](https://github.com/Miaow-Lab/RLVR-Linearity).
 ## 3. Citation

 ---
 base_model:
 - deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
+datasets:
+- Miaow-Lab/RLVR-Linearity-Dataset
+license: apache-2.0
 pipeline_tag: text-generation
+library_name: transformers
+tags:
+- reasoning
+- grpo
+- reinforcement-learning
 ---
 # Model Card
 ## 1. Model Details
+This model is a fine-tuned checkpoint described in the paper **"Not All Steps are Informative: On the Linearity of LLMs’ RLVR Training"**. It was trained using Reinforcement Learning (RL) to investigate the phenomenon of linear evolution in model weights and output log-probabilities during RLVR training.
+- **Paper:** [Not All Steps are Informative: On the Linearity of LLMs' RLVR Training](https://huggingface.co/papers/2601.04537)
+- **Code:** [GitHub Repository](https://github.com/Miaow-Lab/RLVR-Linearity)
 - **Base Model:** [deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B)
+- **Training Method:** GRPO (using the `verl` framework)
 ## 2. Training Details
   - Group Size: 16
 - **Compute:** Trained on `32 x H100` GPUs for about `150` hours.
+For full training configurations, please refer to the `config.json` or the training scripts in the official [GitHub repository](https://github.com/Miaow-Lab/RLVR-Linearity).
 ## 3. Citation