Qwen3-ASR
Collection
4 items • Updated • 1
INT4 AWQ quantized version of Qwen/Qwen3-ASR-1.7B, optimized for on-device inference on Jetson Orin Nano 8 GB via TensorRT-Edge-LLM v0.6.0.
Quantization performed with NVIDIA ModelOpt
using INT4 AWQ (mtq.INT4_AWQ_CFG).
Only the LLM decoder (thinker.model, ~1.4 B / 82% of parameters) is quantized;
audio_tower and lm_head remain in FP16.
Evaluated on 760 VIVOS Vietnamese test samples. BF16 baseline WER: 7.34% (measured on x86; not runnable on Nano due to memory).
| Metric | Value |
|---|---|
| WER | 8.69% |
| RTF | 0.1641 |
| Throughput | 1.72 samples/s |
| RAM footprint | 3.3 GB |
This checkpoint is the input to the TRT-EdgeLLM export pipeline.
It is not directly loadable by standard transformers inference —
use it with qwen-asr-optimization
to export to ONNX and build TRT engines.
[This checkpoint]
│
â–¼ scripts/02_export_onnx.sh
ONNX artefacts
│
â–¼ scripts/03_build_engine.sh (Jetson Orin AGX)
TRT engines
│
â–¼ inference.py / scripts/04_benchmark.sh (Jetson Orin Nano)
Transcription
| Property | Value |
|---|---|
| Method | INT4 AWQ (Activation-Aware Weight Quantization) |
| Config | mtq.INT4_AWQ_CFG |
| Quantized component | thinker.model (LLM decoder only) |
| Excluded | audio_tower, lm_head |
| Calibration data | 257 samples — LibriSpeech EN (60), FLEURS ZH (30), FLEURS 13-lang×7 (91), LibriSpeech functional (76) |
| Base model dtype | FP16 |
Full pipeline documentation: trt-edgellm/README.md
git clone https://github.com/VLAOpt/qwen-asr-optimization.git
cd qwen-asr-optimization
# Download this checkpoint
huggingface-cli download vrfai/qwen3asr-int4 --local-dir ./Qwen3-ASR-1.7B-int4
# Export to ONNX (x86)
bash trt-edgellm/scripts/02_export_onnx.sh ./Qwen3-ASR-1.7B-int4 ./Qwen3-ASR-1.7B-int4-ONNX
# Build TRT engines (Jetson Orin AGX)
bash trt-edgellm/scripts/03_build_engine.sh \
~/Qwen3-ASR-1.7B-int4-ONNX \
~/Qwen3-ASR-1.7B-int4-Engines
# Single-file inference (Jetson Orin Nano)
python trt-edgellm/inference.py \
--audio /path/to/audio.wav \
--engine_dir ~/Qwen3-ASR-1.7B-int4-Engines
| Model | Format | Target | Link |
|---|---|---|---|
| qwen3asr-int4 | INT4 AWQ | Jetson Orin Nano | this repo |
| qwen3asr-int8 | INT8 SmoothQuant | Jetson Orin Nano | vrfai/qwen3asr-int8 |
| qwen3asr-fp8 | FP8 | RTX 5090 (vLLM) | vrfai/qwen3asr-fp8 |
| qwen3asr-nvfp4 | NVFP4 | RTX 5090 (vLLM) | vrfai/qwen3asr-nvfp4 |
Base model
Qwen/Qwen3-ASR-1.7B