Zen3 ASR

Zen3 automatic speech recognition model for transcription, voice agents, and streaming. ~1.7B parameters; a 24-layer audio encoder feeds a Qwen3 text decoder. The config declares support for 30 languages.

Derived by fine-tuning Qwen/Qwen3-ASR-1.7B (Alibaba Cloud, Apache-2.0).

Architecture: Qwen3ASRForConditionalGeneration (qwen3_asr)
Parameters: ~1.7B
Base model: Qwen/Qwen3-ASR-1.7B

Weights

This repository contains the model weights: sharded weights model-0000{1,2}-of-00002.safetensors with model.safetensors.index.json, config and tokenizer files.

The model uses the qwen3_asr architecture and loads with transformers (>= 4.57). It is API-compatible with the upstream base — follow the inference recipe on the base model card Qwen/Qwen3-ASR-1.7B.

Provenance

Fine-tuned from Qwen/Qwen3-ASR-1.7B (Apache-2.0). See NOTICE for full attribution.

Downloads last month: 25

Safetensors

Model size

2B params

Tensor type

BF16

Model tree for zenlm/zen3-asr

Base model

Qwen/Qwen3-ASR-1.7B

Finetuned

(84)

this model

Collection including zenlm/zen3-asr

Zen3 Audio

Collection

Speech recognition + text-to-speech. • 7 items • Updated May 30