Text Generation
Safetensors

Add library_name and improve model card metadata

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +13 -8
README.md CHANGED
@@ -1,21 +1,26 @@
1
  ---
2
- license: apache-2.0
3
- datasets:
4
- - Miaow-Lab/RLVR-Linearity-Dataset
5
  base_model:
6
  - deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
 
 
 
7
  pipeline_tag: text-generation
 
 
 
 
 
8
  ---
9
 
10
  # Model Card
11
 
12
  ## 1. Model Details
13
- This model is the fine-tuned checkpoint described in the paper **"Not All Steps are Informative: On the Linearity of LLMs’ RLVR Training"**. It was trained using Reinforcement Learning (RL) to enhance reasoning capabilities.
14
 
15
- - **Paper:** [ArXiv](https://arxiv.org/pdf/2601.04537v1)
16
- - **Code:** [Github](https://github.com/Miaow-Lab/RLVR-Linearity)
17
  - **Base Model:** [deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B)
18
- - **Training Method:** GRPO
19
 
20
 
21
  ## 2. Training Details
@@ -29,7 +34,7 @@ This model is the fine-tuned checkpoint described in the paper **"Not All Steps
29
  - Group Size: 16
30
  - **Compute:** Trained on `32 x H100` GPUs for about `150` hours.
31
 
32
- For full training configurations, please refer to the `config.json` or the training scripts in our [GitHub](https://github.com/Miaow-Lab/RLVR-Linearity).
33
 
34
  ## 3. Citation
35
 
 
1
  ---
 
 
 
2
  base_model:
3
  - deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
4
+ datasets:
5
+ - Miaow-Lab/RLVR-Linearity-Dataset
6
+ license: apache-2.0
7
  pipeline_tag: text-generation
8
+ library_name: transformers
9
+ tags:
10
+ - reasoning
11
+ - grpo
12
+ - reinforcement-learning
13
  ---
14
 
15
  # Model Card
16
 
17
  ## 1. Model Details
18
+ This model is a fine-tuned checkpoint described in the paper **"Not All Steps are Informative: On the Linearity of LLMs’ RLVR Training"**. It was trained using Reinforcement Learning (RL) to investigate the phenomenon of linear evolution in model weights and output log-probabilities during RLVR training.
19
 
20
+ - **Paper:** [Not All Steps are Informative: On the Linearity of LLMs' RLVR Training](https://huggingface.co/papers/2601.04537)
21
+ - **Code:** [GitHub Repository](https://github.com/Miaow-Lab/RLVR-Linearity)
22
  - **Base Model:** [deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B)
23
+ - **Training Method:** GRPO (using the `verl` framework)
24
 
25
 
26
  ## 2. Training Details
 
34
  - Group Size: 16
35
  - **Compute:** Trained on `32 x H100` GPUs for about `150` hours.
36
 
37
+ For full training configurations, please refer to the `config.json` or the training scripts in the official [GitHub repository](https://github.com/Miaow-Lab/RLVR-Linearity).
38
 
39
  ## 3. Citation
40