Instructions to use noctuashap/Confucius3-Math-DFlash with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use noctuashap/Confucius3-Math-DFlash with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="noctuashap/Confucius3-Math-DFlash", trust_remote_code=True)# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("noctuashap/Confucius3-Math-DFlash", trust_remote_code=True) model = AutoModel.from_pretrained("noctuashap/Confucius3-Math-DFlash", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
Confucius3-Math-DFlash (draft model)
A DFlash block-diffusion speculative-decoding draft model for
netease-youdao/Confucius3-Math.
Use it as the --speculative-config model to accelerate Confucius3-Math inference (especially
single-stream / low-latency math reasoning).
- Target model:
netease-youdao/Confucius3-Math(Qwen2 arch, 48 layers, DeepSeek-R1-distill thinking format) - Draft: 5-layer
DFlashDraftModel, block size 16, ~1.5B params, taps target hidden states from layers [1,12,23,34,45] - Trained with: SpecForge, D-PACE loss, 6 epochs
Results (acceptance length = mean tokens accepted per draft+verify step, thinking mode)
| dataset | accept length | draft accept rate | tok/s (single stream) |
|---|---|---|---|
| GSM8K | 5.47 | 30% | 493 |
| MATH-500 | 5.79 | 32% | 526 |
Higher acceptance ⇒ more tokens emitted per target forward ⇒ larger speedup. Profiled on 1×H200, vLLM 0.22, temperature 0.
Usage (vLLM)
vllm serve netease-youdao/Confucius3-Math \
--speculative-config '{"method": "dflash", "model": "noctuashap/Confucius3-Math-DFlash", "num_speculative_tokens": 15}' \
--trust-remote-code
DFlash is supported in vLLM ≥ 0.20.1. --trust-remote-code is required (the draft is a custom
DFlashDraftModel, included as dflash.py).
Training data
~148k math-leaning prompts (NuminaMath / MATH / GSM8K / OpenMathReasoning + some code/reasoning/general), regenerated by Confucius3-Math itself (thinking traces kept inline) so the draft matches the target's own output distribution. No correctness filtering (distribution matching, not correctness).
Built with Claude Code.
- Downloads last month
- -
Model tree for noctuashap/Confucius3-Math-DFlash
Base model
deepseek-ai/DeepSeek-R1-Distill-Qwen-14B