Update README.md
Browse files
README.md
CHANGED
|
@@ -10,6 +10,13 @@ license: cc-by-nc-4.0
|
|
| 10 |
## Introduction
|
| 11 |
E1-Math-1.5B is a language model fine-tuned from DeepSeek-R1-Distilled-Qwen-1.5B. It is trained for Elastic Reasoning by budget-constrained rollout strategy, integrated into GRPO, which teaches the model to reason adaptively when the thinking process is cut short and generalizes effectively to unseen budget constraints without additional training.
|
| 12 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
## Usage
|
| 14 |
For detailed usage, please refer to [repo](https://github.com/SalesforceAIResearch/Elastic-Reasoning).
|
| 15 |
|
|
|
|
| 10 |
## Introduction
|
| 11 |
E1-Math-1.5B is a language model fine-tuned from DeepSeek-R1-Distilled-Qwen-1.5B. It is trained for Elastic Reasoning by budget-constrained rollout strategy, integrated into GRPO, which teaches the model to reason adaptively when the thinking process is cut short and generalizes effectively to unseen budget constraints without additional training.
|
| 12 |
|
| 13 |
+
## Performance
|
| 14 |
+
|
| 15 |
+
| Model | Tokens | Acc (%) | Tokens | Acc (%) | Tokens | Acc (%) | Tokens | Acc (%) | Tokens | Acc (%) |
|
| 16 |
+
|---------------|--------------|---------------|--------------|---------------|--------------|---------------|--------------|---------------|--------------|---------------|
|
| 17 |
+
| DeepSscaleR-1.5B | 10050 | 41.0| 1488 | 5.2 | 1904 | 9.6 | 2809 | 15.8 | 3700 | 22.7 |
|
| 18 |
+
| E1-Math-1.5B | 6825 | 35.0 | 1340 | 13.5 | 1799 | 17.5 | 2650 | 24.8 | 3377 | 27.9 |
|
| 19 |
+
|
| 20 |
## Usage
|
| 21 |
For detailed usage, please refer to [repo](https://github.com/SalesforceAIResearch/Elastic-Reasoning).
|
| 22 |
|