ekurtic commited on
Commit
ccf4dbe
·
verified ·
1 Parent(s): c5e4d60

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -2
README.md CHANGED
@@ -23,9 +23,13 @@ Model outputs were generated with the vLLM engine.
23
 
24
  For reasoning tasks we estimate pass@1 based on 10 runs with different seeds and `temperature=0.6`, `top_p=0.95` and `max_new_tokens=65536`.
25
 
26
- #### Reasoning tasks (AIME-24, GPQA-Diamond, MATH-500)
27
 
28
- ... coming soon ...
 
 
 
 
 
29
 
30
  ## Contributors
31
  Denis Kuznedelev (Yandex), Eldar Kurtić (Red Hat AI & ISTA), and Dan Alistarh (Red Hat AI & ISTA).
 
23
 
24
  For reasoning tasks we estimate pass@1 based on 10 runs with different seeds and `temperature=0.6`, `top_p=0.95` and `max_new_tokens=65536`.
25
 
 
26
 
27
+ | | Recovery (%) | deepseek/DeepSeek-R1-0528 | ISTA-DASLab/DeepSeek-R1-0528-GPTQ-4b-128g-experts<br>(this model) |
28
+ | --------------------------- | :----------: | :------------------: | :--------------------------------------------------: |
29
+ | AIME 2024<br>pass@1 | 98.50 | 88.66 | 87.33 |
30
+ | MATH-500<br>pass@1 | 99.88 | 97.52 | 97.40 |
31
+ | GPQA Diamond<br>pass@1 | 101.21 | 79.65 | 80.61 |
32
+ | **Reasoning<br>Average Score** | **99.82** | **88.61** | **88.45** |
33
 
34
  ## Contributors
35
  Denis Kuznedelev (Yandex), Eldar Kurtić (Red Hat AI & ISTA), and Dan Alistarh (Red Hat AI & ISTA).