Added OpenLLM leaderboard evaluation
Browse files
README.md
ADDED
|
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
|
| 2 |
+
### Evaluation
|
| 3 |
+
|
| 4 |
+
This model was evaluated on the OpenLLM v1 benchmarks and reasoning tasks (AIME24, GPQA-Diamond, MATH500) . Model outputs were generated with the vLLM engine.
|
| 5 |
+
|
| 6 |
+
| | ArcC | GSM8k | Hellaswag | MMLU | TruthfulQA-mc2 | Winogrande | Average | Recovery |
|
| 7 |
+
|-------------------------------|---------------|-------|-----------|------|------------|------------|---------------|----------|
|
| 8 |
+
| deepseek-ai/DeepSeek-R1 | 72.53 | 95.91 | 89.83 | 87.22 | 59.28 | 82.00 | 81.04 | 100.00 |
|
| 9 |
+
| cognitivecomputations/DeepSeek-R1-AWQ | 73.12 | 95.15 | 89.07 | 86.86| 60.09 | 82.32 | 81.10 | 100.07 |
|
| 10 |
+
| ISTA-DASLab/DeepSeek-R1-GPTQ-4b-128g-experts (this) | 72.53 | 95.68 | 89.36 | 86.99| 59.77 | 83.35 | 81.28 | 100.30 |
|
| 11 |
+
|