OpenYourMind's picture
Upload folder using huggingface_hub
39c7ec2 verified
|
Raw
History Blame Contribute Delete
2.08 kB
---
license: other
license_name: kimi-k2-derivative
tags:
- speculative-decoding
- dflash
- sglang
- draft-model
- kimi-k2
extra_gated_prompt: >-
This is a research preview draft model. Access is granted manually. By
requesting access you agree to use it for research/evaluation and understand
it is an early, single-config checkpoint.
extra_gated_fields:
Intended use: text
Affiliation: text
---
# Kimi-K2.7-coder-DFLASH-preview
A **DFlash** (block-diffusion) speculative-decoding **draft model** for an ablated
**Kimi-K2.7-Code** target. Research preview.
- **Trained with [SpecForge](https://github.com/sgl-project/SpecForge)** (online DFlash training).
- **Trained on only ~55k samples** (single corpus, a few epochs) — deliberately small; this is a *preview*.
- On our ablated Kimi-K2.7-Code target it **already beats a misaligned EAGLE3 drafter**
(a K2.6 EAGLE3 head used on K2.7-Code): measured mean **accept length ~2.5 vs ~1.9**,
~1.3x single-stream decode speedup, with peaks of 4.6-5.2 on long free-form code bodies.
- Architecture: `DFlashDraftModel` (5 layers, hidden 7168), train block size 16 / infer 8,
target layer ids capture, mask token id 163838.
## Serving (sglang)
Serve against the matching target with DFLASH speculative decoding:
```
python -m sglang.launch_server --model-path <kimi-k2.7-code-target> --tp 4 \
--speculative-algorithm DFLASH \
--speculative-draft-model-path Kimi-K2.7-coder-DFLASH-preview \
--speculative-eagle-topk 1 --speculative-dflash-block-size 8 --speculative-num-draft-tokens 8
```
## Production / structured outputs
Stock sglang DFLASH rejects grammar-constrained requests (JSON schema / regex / tool
schemas). Support for that — so this drafter keeps its speedup on coding-CLI JSON
(thin envelope around a large code/diff string) — is proposed upstream in
**sgl-project/sglang PR #28943**.
## Limitations
- Early preview (small data, single config); acceptance plateaus and is not yet tuned.
- Must be paired with the matching ablated Kimi-K2.7-Code target; not a standalone model.