Qwen3-ASR

Running

App Files Files Community

Qwen3-ASR / README.md

littlebird13

Update README.md

3cabb18 verified 3 months ago

preview code

raw

history blame contribute delete

1.64 kB

	---
	title: Qwen3-ASR Demo
	emoji: 🎙️
	colorFrom: blue
	colorTo: purple
	sdk: gradio
	sdk_version: 6.8.0
	app_file: app.py
	pinned: false
	license: apache-2.0
	---

	# Qwen3-ASR Demo

	This Space demonstrates Qwen3-ASR-1.7B, a state-of-the-art automatic speech recognition model from the Qwen team, powered by vLLM for high-speed inference.

	## Features

	- 30+ Language Support: Chinese, Cantonese, English, Japanese, Korean, Arabic, German, French, Spanish, Portuguese, and many more
	- Word/Character-level Timestamps: Accurate timestamp alignment for each word (English) or character (Chinese)
	- Interactive Visualization: Click on each word/character to hear the corresponding audio segment
	- vLLM Backend: Fast inference speed for real-time transcription

	## How to Use

	1. Upload an audio file or record using your microphone
	2. Select a language or leave "Auto" for automatic detection
	3. Enable "Timestamps" for visualization (recommended)
	4. Click "Transcribe" and see the results

	## Models Used

	- ASR Model: [Qwen/Qwen3-ASR-1.7B](https://huggingface.co/Qwen/Qwen3-ASR-1.7B)
	- Forced Aligner: [Qwen/Qwen3-ForcedAligner-0.6B](https://huggingface.co/Qwen/Qwen3-ForcedAligner-0.6B)

	## Setup (For Space Owners)

	This Space requires access to private models. You need to set up the `HF_TOKEN` secret:

	1. Go to your Space Settings
	2. Navigate to "Repository secrets"
	3. Add a new secret with name `HF_TOKEN` and your Hugging Face access token as the value

	## Links

	- [GitHub Repository](https://github.com/Qwen/Qwen3-ASR)
	- [Model Card](https://huggingface.co/Qwen/Qwen3-ASR-1.7B)

	## License

	Apache 2.0