Benchmark
| Capability Benchmark | Qwen3-32B-FP8 | Qwen3-32B-INT8 |
|---|---|---|
| aime25@mean_acc | 0.4555 | 0.4778 |
| gpqa_diamond@mean_acc | 0.6431 | 0.6431 |
| ifeval@mean_prompt_level_strict | 0.8262 | 0.8207 |
| live_code_bench@mean_acc | 0.5498 | 0.5573 |
| mmlu_pro@mean_acc | 0.7816 | 0.7832 |
- 查找"stop_reason": "max_tokens"的sample,发现均为重复回答至max_token,未发现有正常回答至最长的情况
- repeats : 3
- enable_thinking: True
- evalscope version: 1.4.2
Reference
- Downloads last month
- 14