qwen3asr-int4

INT4 AWQ quantized version of Qwen/Qwen3-ASR-1.7B, optimized for on-device inference on Jetson Orin Nano 8 GB via TensorRT-Edge-LLM v0.6.0.

Quantization performed with NVIDIA ModelOpt using INT4 AWQ (mtq.INT4_AWQ_CFG). Only the LLM decoder (thinker.model, ~1.4 B / 82% of parameters) is quantized; audio_tower and lm_head remain in FP16.

Performance — Jetson Orin Nano 8 GB

Evaluated on 760 VIVOS Vietnamese test samples. BF16 baseline WER: 7.34% (measured on x86; not runnable on Nano due to memory).

Metric	Value
WER	8.69%
RTF	0.1641
Throughput	1.72 samples/s
RAM footprint	3.3 GB

Intended Use

This checkpoint is the input to the TRT-EdgeLLM export pipeline. It is not directly loadable by standard transformers inference — use it with qwen-asr-optimization to export to ONNX and build TRT engines.

[This checkpoint]
      │
      ▼  scripts/02_export_onnx.sh
  ONNX artefacts
      │
      ▼  scripts/03_build_engine.sh  (Jetson Orin AGX)
  TRT engines
      │
      ▼  inference.py / scripts/04_benchmark.sh  (Jetson Orin Nano)
  Transcription

Quantization Details

Property	Value
Method	INT4 AWQ (Activation-Aware Weight Quantization)
Config	`mtq.INT4_AWQ_CFG`
Quantized component	`thinker.model` (LLM decoder only)
Excluded	`audio_tower`, `lm_head`
Calibration data	257 samples — LibriSpeech EN (60), FLEURS ZH (30), FLEURS 13-lang×7 (91), LibriSpeech functional (76)
Base model dtype	FP16

Deployment

Full pipeline documentation: trt-edgellm/README.md

Quick start

git clone https://github.com/VLAOpt/qwen-asr-optimization.git
cd qwen-asr-optimization

# Download this checkpoint
huggingface-cli download vrfai/qwen3asr-int4 --local-dir ./Qwen3-ASR-1.7B-int4

# Export to ONNX (x86)
bash trt-edgellm/scripts/02_export_onnx.sh ./Qwen3-ASR-1.7B-int4 ./Qwen3-ASR-1.7B-int4-ONNX

# Build TRT engines (Jetson Orin AGX)
bash trt-edgellm/scripts/03_build_engine.sh \
    ~/Qwen3-ASR-1.7B-int4-ONNX \
    ~/Qwen3-ASR-1.7B-int4-Engines

# Single-file inference (Jetson Orin Nano)
python trt-edgellm/inference.py \
    --audio      /path/to/audio.wav \
    --engine_dir ~/Qwen3-ASR-1.7B-int4-Engines

Related Models

Model	Format	Target	Link
qwen3asr-int4	INT4 AWQ	Jetson Orin Nano	this repo
qwen3asr-int8	INT8 SmoothQuant	Jetson Orin Nano	vrfai/qwen3asr-int8
qwen3asr-fp8	FP8	RTX 5090 (vLLM)	vrfai/qwen3asr-fp8
qwen3asr-nvfp4	NVFP4	RTX 5090 (vLLM)	vrfai/qwen3asr-nvfp4