--- title: Qwen3-ASR Demo emoji: 🎙️ colorFrom: blue colorTo: purple sdk: gradio sdk_version: 6.8.0 app_file: app.py pinned: false license: apache-2.0 --- # Qwen3-ASR Demo This Space demonstrates **Qwen3-ASR-1.7B**, a state-of-the-art automatic speech recognition model from the Qwen team, powered by **vLLM** for high-speed inference. ## Features - **30+ Language Support**: Chinese, Cantonese, English, Japanese, Korean, Arabic, German, French, Spanish, Portuguese, and many more - **Word/Character-level Timestamps**: Accurate timestamp alignment for each word (English) or character (Chinese) - **Interactive Visualization**: Click on each word/character to hear the corresponding audio segment - **vLLM Backend**: Fast inference speed for real-time transcription ## How to Use 1. Upload an audio file or record using your microphone 2. Select a language or leave "Auto" for automatic detection 3. Enable "Timestamps" for visualization (recommended) 4. Click "Transcribe" and see the results ## Models Used - **ASR Model**: [Qwen/Qwen3-ASR-1.7B](https://huggingface.co/Qwen/Qwen3-ASR-1.7B) - **Forced Aligner**: [Qwen/Qwen3-ForcedAligner-0.6B](https://huggingface.co/Qwen/Qwen3-ForcedAligner-0.6B) ## Setup (For Space Owners) This Space requires access to private models. You need to set up the `HF_TOKEN` secret: 1. Go to your Space Settings 2. Navigate to "Repository secrets" 3. Add a new secret with name `HF_TOKEN` and your Hugging Face access token as the value ## Links - [GitHub Repository](https://github.com/Qwen/Qwen3-ASR) - [Model Card](https://huggingface.co/Qwen/Qwen3-ASR-1.7B) ## License Apache 2.0