File size: 715 Bytes
2676b72
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
---
license: apache-2.0
tags:
- reasoning
- mathematics
- reinforcement-learning
datasets:
- AIME
- AMC
- Omni-Math
base_model: DeepScaleR-1.5B
---

# ALP_DeepScaleR_1.5B_C16K

DeepScaleR-1.5B trained with Adaptive Length Penalty (ALP) - reduces token usage by ~50% while maintaining performance.

## Training
- 100 steps GRPO, batch 512, LR 1e-6, β=1e-7
- 16 rollouts/prompt for difficulty estimation
- 16K context window

## Performance (Pass@1)
- MATH-500: 0.80
- AIME: 0.24
- OlympiadBench: 0.51

## Token Usage
- MATH: 2326→646 (-72%)
- AIME: 3906→2254 (-42%)
- Olympiad: 3309→2107 (-36%)

## Usage
```python
prompt = f"{problem} Let's think step by step and output the final answer within \\boxed{{}}."