Inconsistent output

#2
by nephepritou - opened

Compared to Qwen's "official" FP8 quant, this one tends to add redundant characters to text output.

For example, test with VLLM nightly with recommended sampling parameters following question

is /users/me endpoint a bad practice?

This will result in following issues with output:

  • Forgetting to require auth → anyone gets someonesomeone'’s data*
  • Use Vary: Authorization, avoid server-side caching per endpoint without per-user granularitycache keys
  • �💡 Alternatives & Complements:
  • �✅ Best Practices for /users/me
  • However, whether it's *appropriate* depends on **context, **security considerations**, **consistency**, and **implementation quality**. Here’s a balanced breakdown:

There are broken unicode chars, missing closing tags (**context without closing **), repetitions inside of words (someonesomeone) and missing spaces.

Changing sampling parameters doesn't affects these issues. With temp=0.0 output have much more mistakes than with temp=1.0.

But despite this models still performs good in agentic tasks with OpenCode and I don't know how 🫥

Unsloth AI org

Oh hey! Yes this is expected a bit - Qwen or https://huggingface.co/unsloth/Qwen3-Coder-Next-FP8 uses block [128, 128] FP8 whilst this one uses FP8 per channel - this is I think 8-10% faster.

We actually did a benchmark as well for Qwen3-8B for eg: https://unsloth.ai/docs/get-started/reinforcement-learning-rl-guide/fp8-reinforcement-learning
image

We plan in the future to mix block and per row / column to make it slightly more accurate

I didn't notice any formatting/spelling issues yet. However I haven't used the model outside an agent harness yet, meaning there is always 10k+ tokens instructions in my context; also about the expected output format. The only potentially related issue I have is that despite detailed instructions qwen3-coder-next-fp8-dynamic isn't very consistent with Codex 'apply_patch' tool. It doesn't mess up the tool call itself, but the tool input argument (essentially a diff file) is often wrong. I'll try with the block-wise fp8 to be able to compare...

Yes this is expected a bit [..]

So you also observed these formatting/spelling issues? are other unlsoth qwen3-coder-next quants also showing this? To me it's unexpected. I assumed minor accuracy issues in larger models would show up differently (slightly higher tendencies to confuse something, ramble, increased chance of failed tool calls, etc.); Maybe this is something else (inference bug)?

fyi: I encountered 2 lone out-of-place Chinese characters in the output of the Qwen provided FP8 version. Against my intuition it therefore might be just a property of this model to show such token-based/formatting errors under loss of accuracy; after all it's only 3B active parameters.

Sign up or log in to comment