You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Qwen3.5-35B-A3B-DFlash

This model is still under training.

DFlash is a novel speculative decoding method that utilizes a lightweight block diffusion model for drafting. It enables efficient, high-quality parallel drafting that pushes the limits of inference speed.

This model is the drafter component. It must be used in conjunction with the target model Qwen/Qwen3.5-35B-A3B. It was trained with a context length of 4096 tokens.

🚀 Quick Start

SGLang

Installation

uv pip install "git+https://github.com/sgl-project/sglang.git@refs/pull/16818/head#subdirectory=python"

Inference

python -m sglang.launch_server \
    --model-path Qwen/Qwen3.5-35B-A3B \
    --speculative-algorithm DFLASH \
    --speculative-draft-model-path z-lab/Qwen3.5-35B-A3B-DFlash \
    --speculative-num-draft-tokens 16 \
    --tp-size 1 \
    --dtype bfloat16 \
    --attention-backend fa3 \
    --mem-fraction-static 0.75 \
    --trust-remote-code \
    --mamba-scheduler-strategy extra_buffer \
    --reasoning-parser qwen3 \
    --tool-call-parser qwen3_coder

Note: For long-context or agentic usage (such as OpenClaw or Claude Code), consider adding --speculative-dflash-draft-window-size WINDOW_SIZE to enable sliding-window attention for the draft model. Because the draft model is only trained on 4K context, this often improves performance on very long context (50K+ tokens).

Early Results

Thinking: enabled
Max new tokens: 4096
Block size: 16

Dataset Accept Length

GSM8K 6.830

Math500 7.249

HumanEval 8.002

MBPP 6.425

MT-Bench 5.302

Alpaca 5.040

Dataset	Accept Length
GSM8K	6.830
Math500	7.249
HumanEval	8.002
MBPP	6.425
MT-Bench	5.302
Alpaca	5.040

Downloads last month: 512

Safetensors

Model size

1B params

Tensor type

BF16

Collection including z-lab/Qwen3.5-35B-A3B-DFlash

DFlash

Collection

Block Diffusion for Flash Speculative Decoding • 11 items • Updated 2 days ago • 22

Paper for z-lab/Qwen3.5-35B-A3B-DFlash

DFlash: Block Diffusion for Flash Speculative Decoding

Paper • 2602.06036 • Published Feb 5 • 43