What are the benchmarks of the 4 bit model vs the FP8 model?
#9
by
Grossor
- opened
What it says on the title. I'd like to know how much do we "lose" by running this particular 4bit vs the FP8 model.
Hi @Grossor , due to time limit, before release we only did a sanity check by running HMMT'25 Feb, a challenging math benchmark that requires long reasoning (>64K in some cases). Here is the benchmark score we got:
vllm-bf16-baseline 98.44%
step3p5_flash_Q4_K_S.gguf 97.50%
I would say there is minimal loss, and it is still (one of) the most powerful model that can run in 128GB unified memory
thanks!