Benchmark
| Capability Benchmark | Qwen3-8B-FP8 | Qwen3-8B-INT8 |
|---|---|---|
| aime25@mean_acc | 0.3778 | 0.3555 |
| gpqa_diamond@mean_acc | 0.5572 | 0.5589 |
| ifeval@mean_prompt_level_strict | 0.8182 | 0.8152 |
| live_code_bench@mean_acc | 0.557 | 0.5703 |
| mmlu_pro@mean_acc | 0.7058 | 0.7118 |
- 查找"stop_reason": "max_tokens"的sample,发现均为重复回答至max_token,未发现有正常回答至最长的情况
- repeats : 3
- enable_thinking: True
- evalscope version: 1.4.2
Reference
- Downloads last month
- -
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support