CMB-Qwen3-32B-Eagle3

Model Overview

  • Verifier: Qwen/Qwen3-32B
  • Speculative Decoding Algorithm: EAGLE-3
  • Model Architecture: LlamaForCausalLMEagle3
  • Release Date: 03/31/2026
  • Version: 1.0
  • Model Developers: Architecture Management Team, Information Technology Department, China Merchants Bank (CMB)

This is a speculator model designed for use with Qwen/Qwen3-32B, based on the EAGLE-3 speculative decoding algorithm. It was trained by Architecture Management Team, Information Technology Department, China Merchants Bank (CMB). The training data is a combination of Chinese and English open-source datasets, with data synthesis and cleaning performed on the original datasets. The following are the sources of the original training data:

  1. swift/Chinese-Qwen3-235B-Thinking-2507-Distill-data-110k-SFT
  2. a-m-team/AM-Qwen3-Distilled

This model should be used with the Qwen/Qwen3-32B chat template, specifically through the /chat/completions endpoint.

Use with sglang

python3 -m sglang.launch_server \
  --model Qwen/Qwen3-32B \
  --speculative-algorithm EAGLE3 \
  --speculative-draft-model-path <our-model-path> \
  --speculative-num-steps 3 \
  --speculative-eagle-topk 1 \
  --speculative-num-draft-tokens 3

Use with vLLM

It is important to note that when using vLLM, you need to enable the model configuration file for vLLM, which named config_vllm.json. Before entering the following commands, you need to rename the file from config_vllm.json to config.json to make it effective.

vllm serve Qwen/Qwen3-32B \
  -tp 2 \
  --speculative-config '{
    "model": <our-model-path>,
    "num_speculative_tokens": 3,
    "method": "eagle3"
  }'

Evaluations

Use cases

Use Case Dataset Samples
Coding HumanEval 168
Math Reasoning gsm8k 80
English Text Summarization CNN/Daily Mail 80
Chinese Mixed HC3-Chinese 120
Chinese Finance HC3-Chinese 80
Chinese Text Summarization LCSTS 80

Acceptance lengths

Use Case draft_tokens_num=3
Coding 2.11
Math Reasoning 2.60
English Text Summarization 1.95
Chinese Mixed 2.08
Chinese Finance 2.15
Chinese Text Summarization 2.05
Details Configuration
  • temperature: 0.6
  • top_p: 0.95
  • top_k: 20
  • repetitions: 3
  • hardware: 2xH800
Downloads last month
1
Safetensors
Model size
2B params
Tensor type
I64
·
BF16
·
BOOL
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for CMBTech/CMB-Qwen3-32B-Eagle3