Model Overview

Qwen3-VL-235B-A22B-Instruct-eagle3 is a specialized draft model designed to accelerate the inference of the Qwen3-VL-235B-A22B-Instruct ecosystem using the EAGLE3 (Extrapolation Algorithm for Greater Language-model Efficiency) framework.

Built upon the Llama architecture, this model acts as a highly efficient drafter. It has been trained on the ALLaVA-4V dataset, ensuring strict alignment with the teacher model's distribution.

These metrics demonstrate robust acceleration performance across diverse and complex domains on MMStar datasets.

MMStar Benchmark Performance Comparison (v0.5.6.post2)

Model Configuration	TP	Concurrency	Dataset	Output Throughput (tokens/s)	Accept Length
FP8	2	1	mmstar	71	1.000
FP8	2	4	mmstar	186	1.000
FP8	2	16	mmstar	415	1.000
FP8 + EAGLE3 (3, 2, 4)	2	1	mmstar	100	2.239
FP8 + EAGLE3 (3, 2, 4)	2	4	mmstar	244	2.235
FP8 + EAGLE3 (3, 2, 4)	2	16	mmstar	549	2.224

Quick Start

Requirements

NVIDIA GPU
CUDA 12.0+
PyTorch 2.0+

Installation

pip install sglang==0.5.6.post2

and PR #13918

Inference with SGLang

python3 -m sglang.launch_server \
  --model-path Qwen3-VL-235B-A22B-Instruct-FP8 \
  --speculative-draft-model-path AQ-MedAI/Qwen3-VL-235B-A22B-Instruct-eagle3 \
  --trust-remote-code \
  --speculative-algo EAGLE3 \
  --speculative-num-steps 3 \
  --speculative-eagle-topk 2 \
  --speculative-num-draft-tokens 4 \
  --tp 2 \
  --mem-fraction-static 0.9 \
  --host 0.0.0.0 \
  --port 30012

Training Data

The model was trained on 400,000 samples sourced from the ALLaVA-4V dataset.

Citation

If you use this model in your research or application, please cite the following:

@misc{qwen3vleagle3,
  title={Qwen3-VL-235B-A22B-Instruct-eagle3: Accelerating Instruction Following with EAGLE3},
  author={Ant AQ Team},
  year={2026},
}

Downloads last month: 144

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support