---
license: mit
library_name: transformers
tags:
- music
- audio
---
Tech Report
# ACE-Step Captioner
## Description
ACE-Step Captioner is the annotation model used by **ACE-Step v1.5** for training data labeling. It is a professional-grade music captioning model that generates detailed, structured descriptions of audio content.
### Performance
🏆 **Accuracy surpasses Gemini Pro 2.5** in music description tasks
### Key Features
- 🎼 **Musical Style Analysis** - Identifies genres, sub-genres, and stylistic influences
- 🎸 **Instrument Recognition** - Detects and describes 1000+ instrument types and combinations
- 🎭 **Structure & Progression** - Analyzes musical arrangement including intro, verse, chorus, bridge, climax, and outro
- 🔊 **Timbre Description** - Captures tonal qualities, textures, and sonic characteristics
- 📝 **Rich Vocabulary** - Supports 1000+ descriptive terms for comprehensive music annotation
## Usage
The usage is the same as [Qwen2.5 Omni-7B](https://huggingface.co/Qwen/Qwen2.5-Omni-7B).
### Prompt Format
Use the following prompt to caption audio:
```
*Task* Describe this audio in detail