TokForge Acceleration Pack โ€” Qwen3.5 Draft (Deprecated)

Deprecated: this Qwen3.5-0.8B draft bundle is preserved for reproducibility, but the newer Qwen3-0.6B TokForge draft line is the practical default.

Why it is deprecated

Internal fleet testing showed that the smaller Qwen3-0.6B draft family was the better practical draft lane across our mobile tests.

Representative comparison from our internal notes:

Target 0.6B Draft 0.8B Draft
Qwen3.5-9B +59% +16%
Qwen3-8B +57% -41% (wrong arch)
Qwen3-14B +67% -32% (wrong arch)

Use instead

Performance Notes

This bundle is preserved because it was an important research step, but it is not the current practical winner. In our internal mobile testing, the newer Qwen3-0.6B draft family consistently provided better speedups and better architecture pairing.

What is included

  • llm.mnn
  • llm.mnn.weight
  • llm_config.json
  • tokenizer file(s)

Usage

This repo is for TokForge / MNN users who specifically want to reproduce the older Qwen3.5-0.8B draft path.

Limitations and Intended Use

  • Deprecated for normal use.
  • Cross-family drafting was weaker than the later same-family Qwen3-0.6B draft lane.
  • Keep using this only if you need reproducibility for old experiments.

TokForge

Downloads last month
249
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for darkmaniac7/TokForge-AccelerationPack-Qwen35-Draft

Finetuned
(117)
this model