YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Qwen3-0.6B Speculative Decoder for Qwen3-14B
This is a Qwen3-0.6B model fine-tuned to serve as a speculative decoder for Qwen3-14B, enabling faster inference through speculative decoding.
Model Details
- Base Model: Qwen/Qwen3-0.6B
- Target Model: Qwen/Qwen3-14B
- Training: Knowledge distillation with ArcticTraining
- Use Case: Speculative decoding for 2-4x inference speedup
Usage
With vLLM + Arctic Inference
from vllm import LLM
llm = LLM(
model="Qwen/Qwen3-14B",
speculative_model="your-username/qwen3-14b-speculator",
num_speculative_tokens=3,
trust_remote_code=True
)
outputs = llm.generate("Hello, how are you?", max_tokens=100)
print(outputs[0].outputs[0].text)
- Downloads last month
- 2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support