Instructions to use soniqo/Qwen3-ASR-0.6B-Encoder-LiteRT-INT8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LiteRT
How to use soniqo/Qwen3-ASR-0.6B-Encoder-LiteRT-INT8 with LiteRT:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
Qwen3 ASR 0.6B Encoder β LiteRT (INT8)
Qwen3-ASR audio encoder (zh / yue / en). INT8 weight-only.
Part of the soniqo.audio speech toolkit β an open, runtime-portable stack for speech AI. This bundle is the LiteRT export, designed to plug into the abstract interfaces in
speech-core(C++ voice-agent orchestration library). Browse all LiteRT bundles in the soniqo LiteRT collection.
Use cases on soniqo.audio
Audio encoder of Qwen3-ASR-0.6B, specialized for Chinese (including 22 Chinese dialects) and 30 additional languages. Exported to LiteRT for Android. The text decoder is a Qwen3-0.6B LLM and is intended to run through LiteRT-LM as a separate runtime.
Model
| Property | Value |
|---|---|
| Component | Audio encoder only |
| Parameters | ~180 M (encoder), decoder is a separate 0.6B LLM |
| Format | LiteRT (TFLite) |
| Quantization | INT8 dynamic weights (fp32 activations) |
| Sample rate | 16 000 Hz |
| Input | 128-bin log mel, 1000 frames (10 s, fixed) |
| Output | 125 audio embedding tokens, 1024-dim each |
| Languages | 30 + 22 Chinese dialects (Cantonese, Shanghainese, Sichuan, β¦) |
Files
| File | Size | Description |
|---|---|---|
qwen3-asr-encoder.tflite |
180.5 MB | Audio encoder, INT8 |
config.json |
1 KB | Architecture + I/O specs |
Signature
Inputs:
mel [1, 128, 1000] float32 10 s log mel spectrogram
Outputs:
audio_embeddings [1, 125, 1024] float32 For cross-attention into the decoder
Architecture
mel [1, 128, 1000]
βββ 3Γ Conv2d(stride=2) + GELU β [1, 480, 16, 125]
βββ reshape β Linear(7680β896) β [1, 125, 896]
βββ + sinusoidal pos embed
βββ 18Γ pre-norm Transformer β [1, 125, 896]
βββ LayerNorm β Linear(896) β GELU
βββ Linear(896β1024) β [1, 125, 1024]
Why encoder only
The text decoder is a full Qwen3-0.6B language model with GQA, RoPE,
SwiGLU and RMSNorm. It doesn't fit cleanly into a single .tflite; the
right runtime for LLM decoders on Android is
LiteRT-LM or a comparable
LLM executor, with the audio embeddings from this encoder wired in as
cross-attention context.
For ASR-only (no LLM), pair this encoder with a CTC or transducer head fine-tuned on your target languages.
Audio preprocessing
- 16 kHz mono, float32
- 128 log mel bins
n_fft=400,hop_length=160,win_length=400,pad_mode="reflect"- log mel, mean/std normalization per utterance
The exact reference is in the upstream Qwen3-ASR tokenizer config.
Source
Upstream: Qwen/Qwen3-ASR-0.6B (Apache 2.0). Released January 2026 as part of the Qwen3 audio family.
Links
- speech-android β Android SDK
- soniqo.audio β website
- blog β blog
Ecosystem
- soniqo.audio β use-case explorer (transcription, voice cloning, live ASR, voice agents).
- speech-core β C++ orchestration library for voice agents. Abstract
STTInterface/TTSInterface/VADInterface/EnhancerInterface; LiteRT implementations plug straight into the interfaces. - speech-swift β Apple Silicon MLX companion runtime (model-specific MLX bundles linked above where applicable).
- speech-android β Android SDK consuming on-device LiteRT bundles.
Other LiteRT models in this collection
ASR / Transcription
- Parakeet TDT 0.6B v3 β LiteRT (INT8)
- Nemotron Speech Streaming 0.6B β LiteRT
- Omnilingual ASR CTC 300M β LiteRT
- Omnilingual ASR CTC 300M β LiteRT (INT8)
VAD / Diarization
TTS / Voice Cloning
License
This bundle inherits the upstream model license (apache-2.0). See the
linked base_model repository for the full terms.
- Downloads last month
- 19
Model tree for soniqo/Qwen3-ASR-0.6B-Encoder-LiteRT-INT8
Base model
Qwen/Qwen3-ASR-0.6B