Update README.md
Browse files
README.md
CHANGED
|
@@ -49,9 +49,6 @@ print(outputs[0].outputs[0].text)
|
|
| 49 |
|
| 50 |
# 📃Evaluation
|
| 51 |
|
| 52 |
-
LUFFY is evaluated on six competition-level benchmarks, achieving state-of-the-art results among all zero-RL methods. It surpasses both on-policy RL and imitation learning (SFT), especially in generalization:
|
| 53 |
-
|
| 54 |
-
|
| 55 |
| **Model** | **AIME 24** | **AIME 25** | **AMC** | **MATH-500** | **Minerva** | **Olympiad** | **Avg.** |
|
| 56 |
|-------|---------|---------|-----|----------|---------|----------|------|
|
| 57 |
| Qwen2.5-Math-1.5B-Base | 7.9 | 4.7 | 26.4 | 31.0 | 12.1 | 21.5 | 17.3 |
|
|
|
|
| 49 |
|
| 50 |
# 📃Evaluation
|
| 51 |
|
|
|
|
|
|
|
|
|
|
| 52 |
| **Model** | **AIME 24** | **AIME 25** | **AMC** | **MATH-500** | **Minerva** | **Olympiad** | **Avg.** |
|
| 53 |
|-------|---------|---------|-----|----------|---------|----------|------|
|
| 54 |
| Qwen2.5-Math-1.5B-Base | 7.9 | 4.7 | 26.4 | 31.0 | 12.1 | 21.5 | 17.3 |
|