metadata
license: mit
pipeline_tag: audio-text-to-text
library_name: transformers
tags:
- music
- audio
ACE-Step Transcriber
Description
ACE-Step Transcriber is the annotation model used by ACE-Step v1.5 for training data labeling. It is a powerful multilingual audio transcription model capable of transcribing both speech and singing voice with high accuracy.
Key Features
- 🌍 50+ Languages Support - Covers major world languages and regional dialects
- 🎤 Speech Transcription - Accurately transcribes spoken content
- 🎵 Singing Voice Transcription - Specialized in lyrics transcription with musical structure annotations
- 🏷️ Structure Annotation - Automatically identifies song sections (verse, chorus, bridge, etc.)
Usage
The usage is the same as Qwen2.5 Omni-7B.
Prompt Format
Use the following prompt to transcribe audio:
*Task* Transcribe this audio in detail
<audio>
Output Format
The model outputs structured content in the following format:
# Languages
<language_code>
# Lyrics
[Section Tag - Optional Instrument]
<transcribed content>
...
Example Output
# Languages
en
# Lyrics
[Intro - Acoustic Guitar]
[Verse 1]
Walking down the empty street tonight
Stars are shining oh so bright
...
[Chorus]
This is where we belong
Singing our favorite song
...
Supported Section Tags
[Intro],[Outro][Verse 1],[Verse 2], etc.[Chorus],[Pre-Chorus],[Post-Chorus][Bridge][Guitar Interlude],[Instrumental][Spoken]
Supported Languages (50+)
The model supports transcription in over 50 languages, including but not limited to:
| Region | Languages |
|---|---|
| East Asia | Chinese (zh), Japanese (ja), Korean (ko) |
| Southeast Asia | Vietnamese (vi), Thai (th), Indonesian (id), Malay (ms), Filipino (tl) |
| South Asia | Hindi (hi), Bengali (bn), Tamil (ta), Urdu (ur) |
| Europe | English (en), German (de), French (fr), Spanish (es), Italian (it), Portuguese (pt), Russian (ru), Polish (pl), Dutch (nl), Greek (el), Turkish (tr) |
| Middle East | Arabic (ar), Hebrew (he), Persian (fa) |
| Others | And many more regional languages... |
Use Cases
- Music Production - Transcribe reference tracks for lyrics extraction
- Dataset Creation - Generate high-quality labeled data for music AI models
- Accessibility - Create subtitles and captions for audio content
- Music Analysis - Extract structural information from songs