Inconsistent output
Compared to Qwen's "official" FP8 quant, this one tends to add redundant characters to text output.
For example, test with VLLM nightly with recommended sampling parameters following question
is /users/me endpoint a bad practice?
This will result in following issues with output:
Forgetting to require auth → anyone gets someonesomeone'’s data*Use Vary: Authorization, avoid server-side caching per endpoint without per-user granularitycache keys�💡 Alternatives & Complements:�✅ Best Practices for /users/meHowever, whether it's *appropriate* depends on **context, **security considerations**, **consistency**, and **implementation quality**. Here’s a balanced breakdown:
There are broken unicode chars, missing closing tags (**context without closing **), repetitions inside of words (someonesomeone) and missing spaces.
Changing sampling parameters doesn't affects these issues. With temp=0.0 output have much more mistakes than with temp=1.0.
But despite this models still performs good in agentic tasks with OpenCode and I don't know how 🫥
Oh hey! Yes this is expected a bit - Qwen or https://huggingface.co/unsloth/Qwen3-Coder-Next-FP8 uses block [128, 128] FP8 whilst this one uses FP8 per channel - this is I think 8-10% faster.
We actually did a benchmark as well for Qwen3-8B for eg: https://unsloth.ai/docs/get-started/reinforcement-learning-rl-guide/fp8-reinforcement-learning
We plan in the future to mix block and per row / column to make it slightly more accurate
I didn't notice any formatting/spelling issues yet. However I haven't used the model outside an agent harness yet, meaning there is always 10k+ tokens instructions in my context; also about the expected output format. The only potentially related issue I have is that despite detailed instructions qwen3-coder-next-fp8-dynamic isn't very consistent with Codex 'apply_patch' tool. It doesn't mess up the tool call itself, but the tool input argument (essentially a diff file) is often wrong. I'll try with the block-wise fp8 to be able to compare...
Yes this is expected a bit [..]
So you also observed these formatting/spelling issues? are other unlsoth qwen3-coder-next quants also showing this? To me it's unexpected. I assumed minor accuracy issues in larger models would show up differently (slightly higher tendencies to confuse something, ramble, increased chance of failed tool calls, etc.); Maybe this is something else (inference bug)?
fyi: I encountered 2 lone out-of-place Chinese characters in the output of the Qwen provided FP8 version. Against my intuition it therefore might be just a property of this model to show such token-based/formatting errors under loss of accuracy; after all it's only 3B active parameters.