ACE-Step
/

acestep-transcriber

Audio-Text-to-Text

Model card Files Files and versions

acestep-transcriber / README.md

ChuxiJ's picture

Update README.md

53062cf verified 3 days ago

|

history blame contribute delete

2.65 kB

	---
	license: mit
	pipeline_tag: audio-text-to-text
	library_name: transformers
	tags:
	- music
	- audio
	---

	<a href="https://arxiv.org/abs/2602.00744">Tech Report</a>

	# ACE-Step Transcriber

	## Description

	ACE-Step Transcriber is the annotation model used by ACE-Step v1.5 for training data labeling. It is a powerful multilingual audio transcription model capable of transcribing both speech and singing voice with high accuracy.

	### Key Features

	- 🌍 50+ Languages Support - Covers major world languages and regional dialects
	- 🎤 Speech Transcription - Accurately transcribes spoken content
	- 🎵 Singing Voice Transcription - Specialized in lyrics transcription with musical structure annotations
	- 🏷️ Structure Annotation - Automatically identifies song sections (verse, chorus, bridge, etc.)

	## Usage

	The usage is the same as [Qwen2.5 Omni-7B](https://huggingface.co/Qwen/Qwen2.5-Omni-7B).

	### Prompt Format

	Use the following prompt to transcribe audio:

	```
	Task Transcribe this audio in detail
	<audio>
	```

	### Output Format

	The model outputs structured content in the following format:

	```
	# Languages
	<language_code>

	# Lyrics
	[Section Tag - Optional Instrument]

	<transcribed content>
	...
	```

	### Example Output

	```
	# Languages
	en

	# Lyrics
	[Intro - Acoustic Guitar]

	[Verse 1]
	Walking down the empty street tonight
	Stars are shining oh so bright
	...

	[Chorus]
	This is where we belong
	Singing our favorite song
	...
	```

	### Supported Section Tags

	- `[Intro]`, `[Outro]`
	- `[Verse 1]`, `[Verse 2]`, etc.
	- `[Chorus]`, `[Pre-Chorus]`, `[Post-Chorus]`
	- `[Bridge]`
	- `[Guitar Interlude]`, `[Instrumental]`
	- `[Spoken]`

	### Supported Languages (50+)

	The model supports transcription in over 50 languages, including but not limited to:

	\| Region \| Languages \|
	\|--------\|-----------\|
	\| East Asia \| Chinese (zh), Japanese (ja), Korean (ko) \|
	\| Southeast Asia \| Vietnamese (vi), Thai (th), Indonesian (id), Malay (ms), Filipino (tl) \|
	\| South Asia \| Hindi (hi), Bengali (bn), Tamil (ta), Urdu (ur) \|
	\| Europe \| English (en), German (de), French (fr), Spanish (es), Italian (it), Portuguese (pt), Russian (ru), Polish (pl), Dutch (nl), Greek (el), Turkish (tr) \|
	\| Middle East \| Arabic (ar), Hebrew (he), Persian (fa) \|
	\| Others \| And many more regional languages... \|

	## Use Cases

	- Music Production - Transcribe reference tracks for lyrics extraction
	- Dataset Creation - Generate high-quality labeled data for music AI models
	- Accessibility - Create subtitles and captions for audio content
	- Music Analysis - Extract structural information from songs