No other benchmarks than prefill speed and token generation speed ?

#3
by bdutta - opened

Far from being a LLM expert but I frequently see various benchmarks s.a. Aider benchmark, SWEbench, MMLU etc. that indicate quality & relevance of the generation to specific problem (or prompt) types. While the prefill speed and token generation speed are very impressive and good to see comparison against gpt-oss-20b and qwen3-30b, it'd be good to see how it compares in qualitative terms as well. Any thing planned to be shared around such aspects ?

Sign up or log in to comment