| --- |
| license: other |
| license_name: kimi-k2-derivative |
| tags: |
| - speculative-decoding |
| - dflash |
| - sglang |
| - draft-model |
| - kimi-k2 |
| extra_gated_prompt: >- |
| This is a research preview draft model. Access is granted manually. By |
| requesting access you agree to use it for research/evaluation and understand |
| it is an early, single-config checkpoint. |
| extra_gated_fields: |
| Intended use: text |
| Affiliation: text |
| --- |
| |
| # Kimi-K2.7-coder-DFLASH-preview |
|
|
| A **DFlash** (block-diffusion) speculative-decoding **draft model** for an ablated |
| **Kimi-K2.7-Code** target. Research preview. |
|
|
| - **Trained with [SpecForge](https://github.com/sgl-project/SpecForge)** (online DFlash training). |
| - **Trained on only ~55k samples** (single corpus, a few epochs) — deliberately small; this is a *preview*. |
| - On our ablated Kimi-K2.7-Code target it **already beats a misaligned EAGLE3 drafter** |
| (a K2.6 EAGLE3 head used on K2.7-Code): measured mean **accept length ~2.5 vs ~1.9**, |
| ~1.3x single-stream decode speedup, with peaks of 4.6-5.2 on long free-form code bodies. |
| - Architecture: `DFlashDraftModel` (5 layers, hidden 7168), train block size 16 / infer 8, |
| target layer ids capture, mask token id 163838. |
|
|
| ## Serving (sglang) |
|
|
| Serve against the matching target with DFLASH speculative decoding: |
|
|
| ``` |
| python -m sglang.launch_server --model-path <kimi-k2.7-code-target> --tp 4 \ |
| --speculative-algorithm DFLASH \ |
| --speculative-draft-model-path Kimi-K2.7-coder-DFLASH-preview \ |
| --speculative-eagle-topk 1 --speculative-dflash-block-size 8 --speculative-num-draft-tokens 8 |
| ``` |
|
|
| ## Production / structured outputs |
|
|
| Stock sglang DFLASH rejects grammar-constrained requests (JSON schema / regex / tool |
| schemas). Support for that — so this drafter keeps its speedup on coding-CLI JSON |
| (thin envelope around a large code/diff string) — is proposed upstream in |
| **sgl-project/sglang PR #28943**. |
|
|
| ## Limitations |
|
|
| - Early preview (small data, single config); acceptance plateaus and is not yet tuned. |
| - Must be paired with the matching ablated Kimi-K2.7-Code target; not a standalone model. |
|
|