SpiridonSunRotator commited on
Commit
1348da8
·
verified ·
1 Parent(s): 4cf3b35

Added OpenLLM leaderboard evaluation

Browse files
Files changed (1) hide show
  1. README.md +11 -0
README.md ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ### Evaluation
3
+
4
+ This model was evaluated on the OpenLLM v1 benchmarks and reasoning tasks (AIME24, GPQA-Diamond, MATH500) . Model outputs were generated with the vLLM engine.
5
+
6
+ | | ArcC | GSM8k | Hellaswag | MMLU | TruthfulQA-mc2 | Winogrande | Average | Recovery |
7
+ |-------------------------------|---------------|-------|-----------|------|------------|------------|---------------|----------|
8
+ | deepseek-ai/DeepSeek-R1 | 72.53 | 95.91 | 89.83 | 87.22 | 59.28 | 82.00 | 81.04 | 100.00 |
9
+ | cognitivecomputations/DeepSeek-R1-AWQ | 73.12 | 95.15 | 89.07 | 86.86| 60.09 | 82.32 | 81.10 | 100.07 |
10
+ | ISTA-DASLab/DeepSeek-R1-GPTQ-4b-128g-experts (this) | 72.53 | 95.68 | 89.36 | 86.99| 59.77 | 83.35 | 81.28 | 100.30 |
11
+