Model Overview
Qwen3-VL-235B-A22B-Instruct-eagle3 is a specialized draft model designed to accelerate the inference of the Qwen3-VL-235B-A22B-Instruct ecosystem using the EAGLE3 (Extrapolation Algorithm for Greater Language-model Efficiency) framework.
Built upon the Llama architecture, this model acts as a highly efficient drafter. It has been trained on the ALLaVA-4V dataset, ensuring strict alignment with the teacher model's distribution.
These metrics demonstrate robust acceleration performance across diverse and complex domains on MMStar datasets.
MMStar Benchmark Performance Comparison (v0.5.6.post2)
| Model Configuration | TP | Concurrency | Dataset | Output Throughput (tokens/s) | Accept Length |
|---|---|---|---|---|---|
| FP8 | 2 | 1 | mmstar | 71 | 1.000 |
| FP8 | 2 | 4 | mmstar | 186 | 1.000 |
| FP8 | 2 | 16 | mmstar | 415 | 1.000 |
| FP8 + EAGLE3 (3, 2, 4) | 2 | 1 | mmstar | 100 | 2.239 |
| FP8 + EAGLE3 (3, 2, 4) | 2 | 4 | mmstar | 244 | 2.235 |
| FP8 + EAGLE3 (3, 2, 4) | 2 | 16 | mmstar | 549 | 2.224 |
Quick Start
Requirements
- NVIDIA GPU
- CUDA 12.0+
- PyTorch 2.0+
Installation
pip install sglang==0.5.6.post2
and PR #13918
Inference with SGLang
python3 -m sglang.launch_server \
--model-path Qwen3-VL-235B-A22B-Instruct-FP8 \
--speculative-draft-model-path AQ-MedAI/Qwen3-VL-235B-A22B-Instruct-eagle3 \
--trust-remote-code \
--speculative-algo EAGLE3 \
--speculative-num-steps 3 \
--speculative-eagle-topk 2 \
--speculative-num-draft-tokens 4 \
--tp 2 \
--mem-fraction-static 0.9 \
--host 0.0.0.0 \
--port 30012
Training Data
The model was trained on 400,000 samples sourced from the ALLaVA-4V dataset.
Citation
If you use this model in your research or application, please cite the following:
@misc{qwen3vleagle3,
title={Qwen3-VL-235B-A22B-Instruct-eagle3: Accelerating Instruction Following with EAGLE3},
author={Ant AQ Team},
year={2026},
}
- Downloads last month
- 144
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
