OpenYourMind
/

Kimi-K2.7-coder-DFLASH-preview

speculative-decoding

Model card Files Files and versions

Kimi-K2.7-coder-DFLASH-preview / README.md

OpenYourMind's picture

Upload folder using huggingface_hub

39c7ec2 verified 13 days ago

|

History Blame Contribute Delete

2.08 kB

	---
	license: other
	license_name: kimi-k2-derivative
	tags:
	- speculative-decoding
	- dflash
	- sglang
	- draft-model
	- kimi-k2
	extra_gated_prompt: >-
	This is a research preview draft model. Access is granted manually. By
	requesting access you agree to use it for research/evaluation and understand
	it is an early, single-config checkpoint.
	extra_gated_fields:
	Intended use: text
	Affiliation: text
	---

	# Kimi-K2.7-coder-DFLASH-preview

	A DFlash (block-diffusion) speculative-decoding draft model for an ablated
	Kimi-K2.7-Code target. Research preview.

	- Trained with [SpecForge](https://github.com/sgl-project/SpecForge) (online DFlash training).
	- Trained on only ~55k samples (single corpus, a few epochs) — deliberately small; this is a preview.
	- On our ablated Kimi-K2.7-Code target it already beats a misaligned EAGLE3 drafter
	(a K2.6 EAGLE3 head used on K2.7-Code): measured mean accept length ~2.5 vs ~1.9,
	~1.3x single-stream decode speedup, with peaks of 4.6-5.2 on long free-form code bodies.
	- Architecture: `DFlashDraftModel` (5 layers, hidden 7168), train block size 16 / infer 8,
	target layer ids capture, mask token id 163838.

	## Serving (sglang)

	Serve against the matching target with DFLASH speculative decoding:

	```
	python -m sglang.launch_server --model-path <kimi-k2.7-code-target> --tp 4 \
	--speculative-algorithm DFLASH \
	--speculative-draft-model-path Kimi-K2.7-coder-DFLASH-preview \
	--speculative-eagle-topk 1 --speculative-dflash-block-size 8 --speculative-num-draft-tokens 8
	```

	## Production / structured outputs

	Stock sglang DFLASH rejects grammar-constrained requests (JSON schema / regex / tool
	schemas). Support for that — so this drafter keeps its speedup on coding-CLI JSON
	(thin envelope around a large code/diff string) — is proposed upstream in
	sgl-project/sglang PR #28943.

	## Limitations

	- Early preview (small data, single config); acceptance plateaus and is not yet tuned.
	- Must be paired with the matching ablated Kimi-K2.7-Code target; not a standalone model.