Update README.md
Browse files
README.md
CHANGED
|
@@ -12,6 +12,8 @@ pipeline_tag: text-generation
|
|
| 12 |
|
| 13 |
**q1-3B-PRIME**, a small reasoning model trained with reinforcement learning.
|
| 14 |
|
|
|
|
|
|
|
| 15 |
# Benchmark Performance
|
| 16 |
|
| 17 |
Math
|
|
|
|
| 12 |
|
| 13 |
**q1-3B-PRIME**, a small reasoning model trained with reinforcement learning.
|
| 14 |
|
| 15 |
+
Trained using SmallThinker-3B-Preview as a base model (Qwen2.5-3B-Instruct full finetuned on QwQ reasoning traces) for a roughly ~22.5% improvement on the test set in 120 training steps. (Note: lots of performance left on the table since PRIME saturates after 300 steps.)
|
| 16 |
+
|
| 17 |
# Benchmark Performance
|
| 18 |
|
| 19 |
Math
|