YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Qwen3-0.6B Speculative Decoder for Qwen3-14B

This is a Qwen3-0.6B model fine-tuned to serve as a speculative decoder for Qwen3-14B, enabling faster inference through speculative decoding.

Model Details

  • Base Model: Qwen/Qwen3-0.6B
  • Target Model: Qwen/Qwen3-14B
  • Training: Knowledge distillation with ArcticTraining
  • Use Case: Speculative decoding for 2-4x inference speedup

Usage

With vLLM + Arctic Inference

from vllm import LLM

llm = LLM(
    model="Qwen/Qwen3-14B",
    speculative_model="your-username/qwen3-14b-speculator",
    num_speculative_tokens=3,
    trust_remote_code=True
)

outputs = llm.generate("Hello, how are you?", max_tokens=100)
print(outputs[0].outputs[0].text)
Downloads last month
2
Safetensors
Model size
0.6B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support