A newer version of the Gradio SDK is available: 6.15.2
metadata
title: Qwen3-ASR Demo
emoji: 🎙️
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 6.8.0
app_file: app.py
pinned: false
license: apache-2.0
Qwen3-ASR Demo
This Space demonstrates Qwen3-ASR-1.7B, a state-of-the-art automatic speech recognition model from the Qwen team, powered by vLLM for high-speed inference.
Features
- 30+ Language Support: Chinese, Cantonese, English, Japanese, Korean, Arabic, German, French, Spanish, Portuguese, and many more
- Word/Character-level Timestamps: Accurate timestamp alignment for each word (English) or character (Chinese)
- Interactive Visualization: Click on each word/character to hear the corresponding audio segment
- vLLM Backend: Fast inference speed for real-time transcription
How to Use
- Upload an audio file or record using your microphone
- Select a language or leave "Auto" for automatic detection
- Enable "Timestamps" for visualization (recommended)
- Click "Transcribe" and see the results
Models Used
- ASR Model: Qwen/Qwen3-ASR-1.7B
- Forced Aligner: Qwen/Qwen3-ForcedAligner-0.6B
Setup (For Space Owners)
This Space requires access to private models. You need to set up the HF_TOKEN secret:
- Go to your Space Settings
- Navigate to "Repository secrets"
- Add a new secret with name
HF_TOKENand your Hugging Face access token as the value
Links
License
Apache 2.0