No other benchmarks than prefill speed and token generation speed ?

by bdutta - opened Feb 25

Feb 25

Far from being a LLM expert but I frequently see various benchmarks s.a. Aider benchmark, SWEbench, MMLU etc. that indicate quality & relevance of the generation to specific problem (or prompt) types. While the prefill speed and token generation speed are very impressive and good to see comparison against gpt-oss-20b and qwen3-30b, it'd be good to see how it compares in qualitative terms as well. Any thing planned to be shared around such aspects ?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment