acestep-transcriber / README.md

ChuxiJ

Update README.md

53062cf verified 3 days ago

preview code

raw

history blame contribute delete

2.65 kB

metadata

license: mit
pipeline_tag: audio-text-to-text
library_name: transformers
tags:
  - music
  - audio

Tech Report

ACE-Step Transcriber

Description

ACE-Step Transcriber is the annotation model used by ACE-Step v1.5 for training data labeling. It is a powerful multilingual audio transcription model capable of transcribing both speech and singing voice with high accuracy.

Key Features

🌍 50+ Languages Support - Covers major world languages and regional dialects
🎤 Speech Transcription - Accurately transcribes spoken content
🎵 Singing Voice Transcription - Specialized in lyrics transcription with musical structure annotations
🏷️ Structure Annotation - Automatically identifies song sections (verse, chorus, bridge, etc.)

Usage

The usage is the same as Qwen2.5 Omni-7B.

Prompt Format

Use the following prompt to transcribe audio:

*Task* Transcribe this audio in detail
<audio>

Output Format

The model outputs structured content in the following format:

# Languages
<language_code>

# Lyrics
[Section Tag - Optional Instrument]

<transcribed content>
...

Example Output

# Languages
en

# Lyrics
[Intro - Acoustic Guitar]

[Verse 1]
Walking down the empty street tonight
Stars are shining oh so bright
...

[Chorus]
This is where we belong
Singing our favorite song
...

Supported Section Tags

[Intro], [Outro]
[Verse 1], [Verse 2], etc.
[Chorus], [Pre-Chorus], [Post-Chorus]
[Bridge]
[Guitar Interlude], [Instrumental]
[Spoken]

Supported Languages (50+)

The model supports transcription in over 50 languages, including but not limited to:

Region	Languages
East Asia	Chinese (zh), Japanese (ja), Korean (ko)
Southeast Asia	Vietnamese (vi), Thai (th), Indonesian (id), Malay (ms), Filipino (tl)
South Asia	Hindi (hi), Bengali (bn), Tamil (ta), Urdu (ur)
Europe	English (en), German (de), French (fr), Spanish (es), Italian (it), Portuguese (pt), Russian (ru), Polish (pl), Dutch (nl), Greek (el), Turkish (tr)
Middle East	Arabic (ar), Hebrew (he), Persian (fa)
Others	And many more regional languages...

Use Cases

Music Production - Transcribe reference tracks for lyrics extraction
Dataset Creation - Generate high-quality labeled data for music AI models
Accessibility - Create subtitles and captions for audio content
Music Analysis - Extract structural information from songs