CMB-Qwen3-32B-Eagle3
Model Overview
- Verifier: Qwen/Qwen3-32B
- Speculative Decoding Algorithm: EAGLE-3
- Model Architecture: LlamaForCausalLMEagle3
- Release Date: 03/31/2026
- Version: 1.0
- Model Developers: Architecture Management Team, Information Technology Department, China Merchants Bank (CMB)
This is a speculator model designed for use with Qwen/Qwen3-32B, based on the EAGLE-3 speculative decoding algorithm. It was trained by Architecture Management Team, Information Technology Department, China Merchants Bank (CMB). The training data is a combination of Chinese and English open-source datasets, with data synthesis and cleaning performed on the original datasets. The following are the sources of the original training data:
This model should be used with the Qwen/Qwen3-32B chat template, specifically through the /chat/completions endpoint.
Use with sglang
python3 -m sglang.launch_server \
--model Qwen/Qwen3-32B \
--speculative-algorithm EAGLE3 \
--speculative-draft-model-path <our-model-path> \
--speculative-num-steps 3 \
--speculative-eagle-topk 1 \
--speculative-num-draft-tokens 3
Use with vLLM
It is important to note that when using vLLM, you need to enable the model configuration file for vLLM, which named config_vllm.json. Before entering the following commands, you need to rename the file from config_vllm.json to config.json to make it effective.
vllm serve Qwen/Qwen3-32B \
-tp 2 \
--speculative-config '{
"model": <our-model-path>,
"num_speculative_tokens": 3,
"method": "eagle3"
}'
Evaluations
Use cases
| Use Case | Dataset | Samples |
|---|---|---|
| Coding | HumanEval | 168 |
| Math Reasoning | gsm8k | 80 |
| English Text Summarization | CNN/Daily Mail | 80 |
| Chinese Mixed | HC3-Chinese | 120 |
| Chinese Finance | HC3-Chinese | 80 |
| Chinese Text Summarization | LCSTS | 80 |
Acceptance lengths
| Use Case | draft_tokens_num=3 |
|---|---|
| Coding | 2.11 |
| Math Reasoning | 2.60 |
| English Text Summarization | 1.95 |
| Chinese Mixed | 2.08 |
| Chinese Finance | 2.15 |
| Chinese Text Summarization | 2.05 |
Details
Configuration- temperature: 0.6
- top_p: 0.95
- top_k: 20
- repetitions: 3
- hardware: 2xH800
- Downloads last month
- 1