---
title: Qwen3-ASR Demo
emoji: 🎙️
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 6.8.0
app_file: app.py
pinned: false
license: apache-2.0
---

# Qwen3-ASR Demo

This Space demonstrates **Qwen3-ASR-1.7B**, a state-of-the-art automatic speech recognition model from the Qwen team, powered by **vLLM** for high-speed inference.

## Features

- **30+ Language Support**: Chinese, Cantonese, English, Japanese, Korean, Arabic, German, French, Spanish, Portuguese, and many more
- **Word/Character-level Timestamps**: Accurate timestamp alignment for each word (English) or character (Chinese)
- **Interactive Visualization**: Click on each word/character to hear the corresponding audio segment
- **vLLM Backend**: Fast inference speed for real-time transcription

## How to Use

1. Upload an audio file or record using your microphone
2. Select a language or leave "Auto" for automatic detection
3. Enable "Timestamps" for visualization (recommended)
4. Click "Transcribe" and see the results

## Models Used

- **ASR Model**: [Qwen/Qwen3-ASR-1.7B](https://huggingface.co/Qwen/Qwen3-ASR-1.7B)
- **Forced Aligner**: [Qwen/Qwen3-ForcedAligner-0.6B](https://huggingface.co/Qwen/Qwen3-ForcedAligner-0.6B)

## Setup (For Space Owners)

This Space requires access to private models. You need to set up the `HF_TOKEN` secret:

1. Go to your Space Settings
2. Navigate to "Repository secrets"
3. Add a new secret with name `HF_TOKEN` and your Hugging Face access token as the value

## Links

- [GitHub Repository](https://github.com/Qwen/Qwen3-ASR)
- [Model Card](https://huggingface.co/Qwen/Qwen3-ASR-1.7B)

## License

Apache 2.0