qwen3asr-int4

INT4 AWQ quantized version of Qwen/Qwen3-ASR-1.7B, optimized for on-device inference on Jetson Orin Nano 8 GB via TensorRT-Edge-LLM v0.6.0.

Quantization performed with NVIDIA ModelOpt using INT4 AWQ (mtq.INT4_AWQ_CFG). Only the LLM decoder (thinker.model, ~1.4 B / 82% of parameters) is quantized; audio_tower and lm_head remain in FP16.


Performance — Jetson Orin Nano 8 GB

Evaluated on 760 VIVOS Vietnamese test samples. BF16 baseline WER: 7.34% (measured on x86; not runnable on Nano due to memory).

Metric Value
WER 8.69%
RTF 0.1641
Throughput 1.72 samples/s
RAM footprint 3.3 GB

Intended Use

This checkpoint is the input to the TRT-EdgeLLM export pipeline. It is not directly loadable by standard transformers inference — use it with qwen-asr-optimization to export to ONNX and build TRT engines.

[This checkpoint]
      │
      â–¼  scripts/02_export_onnx.sh
  ONNX artefacts
      │
      â–¼  scripts/03_build_engine.sh  (Jetson Orin AGX)
  TRT engines
      │
      â–¼  inference.py / scripts/04_benchmark.sh  (Jetson Orin Nano)
  Transcription

Quantization Details

Property Value
Method INT4 AWQ (Activation-Aware Weight Quantization)
Config mtq.INT4_AWQ_CFG
Quantized component thinker.model (LLM decoder only)
Excluded audio_tower, lm_head
Calibration data 257 samples — LibriSpeech EN (60), FLEURS ZH (30), FLEURS 13-lang×7 (91), LibriSpeech functional (76)
Base model dtype FP16

Deployment

Full pipeline documentation: trt-edgellm/README.md

Quick start

git clone https://github.com/VLAOpt/qwen-asr-optimization.git
cd qwen-asr-optimization

# Download this checkpoint
huggingface-cli download vrfai/qwen3asr-int4 --local-dir ./Qwen3-ASR-1.7B-int4

# Export to ONNX (x86)
bash trt-edgellm/scripts/02_export_onnx.sh ./Qwen3-ASR-1.7B-int4 ./Qwen3-ASR-1.7B-int4-ONNX

# Build TRT engines (Jetson Orin AGX)
bash trt-edgellm/scripts/03_build_engine.sh \
    ~/Qwen3-ASR-1.7B-int4-ONNX \
    ~/Qwen3-ASR-1.7B-int4-Engines

# Single-file inference (Jetson Orin Nano)
python trt-edgellm/inference.py \
    --audio      /path/to/audio.wav \
    --engine_dir ~/Qwen3-ASR-1.7B-int4-Engines

Related Models

Model Format Target Link
qwen3asr-int4 INT4 AWQ Jetson Orin Nano this repo
qwen3asr-int8 INT8 SmoothQuant Jetson Orin Nano vrfai/qwen3asr-int8
qwen3asr-fp8 FP8 RTX 5090 (vLLM) vrfai/qwen3asr-fp8
qwen3asr-nvfp4 NVFP4 RTX 5090 (vLLM) vrfai/qwen3asr-nvfp4

References

Downloads last month
-
Safetensors
Model size
2B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for vrfai/Qwen3-ASR-1.7B-int4

Finetuned
(23)
this model

Collection including vrfai/Qwen3-ASR-1.7B-int4

Paper for vrfai/Qwen3-ASR-1.7B-int4