--- license: mit library_name: transformers tags: - music - audio --- Tech Report # ACE-Step Captioner ## Description ACE-Step Captioner is the annotation model used by **ACE-Step v1.5** for training data labeling. It is a professional-grade music captioning model that generates detailed, structured descriptions of audio content. ### Performance 🏆 **Accuracy surpasses Gemini Pro 2.5** in music description tasks ### Key Features - 🎼 **Musical Style Analysis** - Identifies genres, sub-genres, and stylistic influences - 🎸 **Instrument Recognition** - Detects and describes 1000+ instrument types and combinations - 🎭 **Structure & Progression** - Analyzes musical arrangement including intro, verse, chorus, bridge, climax, and outro - 🔊 **Timbre Description** - Captures tonal qualities, textures, and sonic characteristics - 📝 **Rich Vocabulary** - Supports 1000+ descriptive terms for comprehensive music annotation ## Usage The usage is the same as [Qwen2.5 Omni-7B](https://huggingface.co/Qwen/Qwen2.5-Omni-7B). ### Prompt Format Use the following prompt to caption audio: ``` *Task* Describe this audio in detail